Kafka--JAVA API(Producer和Consumer)
Kafka 版本2.11-0.9.0.0
producer
package com.yzy.spark.kafka;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.Properties;
public class KafkaProducer extends Thread{
private String topic;
//--1
private Producer producer;
public KafkaProducer(String topic){
this.topic=topic;
Properties properties = new Properties(); //--2
properties.put("metadata.broker.list",KafkaProperties.BROKER_LIST);
properties.put("serializer.class","kafka.serializer.StringEncoder");
properties.put("request.require.acks","1");
// properties.put("partitioner.class","com.yzy.spark.kafka.MyPartition");
ProducerConfig config=new ProducerConfig(properties);
producer=new Producer<>(config);
}
@Override
public void run() {
int messageNo=1;
while (true){
String message="message"+ messageNo;
producer.send(new KeyedMessage("test2",String.valueOf(messageNo),message)); //--4
System.out.println("Sent:"+message);
messageNo++;
try {
Thread.sleep(1000);
}catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
1.定义Producer对象,这里要注意泛型类型,之后的KeyedMessage的泛型类型和Producer相同。
2.创建Producer对象需要传入一个ProducerConfig对象,而ProducerConfig对象需要由Properties对象构造,properties的属性设置可以查看ProducerConfig源码,注释很详细(个别属性在ProducerConfig父类AsyncProducerConfig 和 SyncProducerConfigShared中)。
3.该属性可以指定partitioner,如果不设置默认会设为kafka.producer.DefaultPartitioner。
4.看看KeyedMessage的源码:
case class KeyedMessage[K, V](topic: String, key: K, partKey: Any, message: V) {
if(topic == null)
throw new IllegalArgumentException("Topic cannot be null.")
def this(topic: String, message: V) = this(topic, null.asInstanceOf[K], null, message)
def this(topic: String, key: K, message: V) = this(topic, key, key, message)
def partitionKey = {
if(partKey != null)
partKey
else if(hasKey)
key
else
null
}
def hasKey = key != null
}
参数有4个,topic必填,message是消息,通常只填这两个参数即可发送消息。key和partKey是用于partition的参数,partKey的优先级高于key,但是partKey只对当前消息起作用,key和partKey只能是String类型。下面来看看partition策略和key。
partition
先在服务器端将topic test2的partitions设定为3
kafka-topics.sh --alter --zookeeper localhost:2181 --partitions 3 --topic test2
然后回到客户端看看kafka.producer.DefaultPartitioner源码
package kafka.producer
import kafka.utils._
import org.apache.kafka.common.utils.Utils
class DefaultPartitioner(props: VerifiableProperties = null) extends Partitioner {
private val random = new java.util.Random
def partition(key: Any, numPartitions: Int): Int = {
Utils.abs(key.hashCode) % numPartitions
}
}
该类有一个方法 def partition(key: Any, numPartitions: Int),第一个参数为上文所说的key或partKey,第二个为partitions的数量,传入的值就是在服务器设置的值(3),将key的hashCode对numPartitions取余得到结果(选择对应编号的partition)
我们可以自己定义一个partition.class并配置到properties属性中,这里给一个简单的例子:
package com.yzy.spark.kafka;
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class MyPartition implements Partitioner {
public MyPartition(VerifiableProperties properties){
}
@Override
public int partition(Object key, int numPartitions) {
System.out.println("numPartitions:"+numPartitions);
return key.hashCode()%numPartitions;
}
}
Consumer
package com.yzy.spark.kafka;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
public class KafkaConsumer extends Thread{
private String topic;
private String groupId;
public KafkaConsumer(String topic,String groupId){
this.topic=topic;
this.groupId=groupId;
}
private ConsumerConnector createConnector(){
Properties properties=new Properties();//--1
properties.put("zookeeper.connect",KafkaProperties.ZK);
properties.put("group.id",groupId);
properties.put("auto.offset.reset", "largest");//--2
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
return Consumer.createJavaConsumerConnector(consumerConfig);
}
@Override
public void run() {
ConsumerConnector consumerConnector=this.createConnector();
Map topicCountMap=new HashMap<>();
topicCountMap.put(topic,1);
Map>> messageStreams = consumerConnector.createMessageStreams(topicCountMap);
KafkaStream stream = messageStreams.get(topic).get(0);
ConsumerIterator iterator = stream.iterator();
while(iterator.hasNext()){
String message=new String(iterator.next().message());
}
}
}
Consumer相关的东西比较多,涉及到group和partition机制,以官方文档为准。
1.properties和producer一样看源码配置。
2.这个属性和shell命令中的--from-beginning对应。可以填smallest(从头读取)和largest(默认值,读取最新的元素,严格来说是最新的offset位置开始读取)。注意:每一次一个新的consumer试图去消费一个topic时,都是从所在group的largest offset位置读取,即也可通过设置group.id来实现from-beginning,只要将每个consumer的group.id都设置为一个新值即可,例如properties.put("group.id", UUID.randomUUID().toString());