In your kafka server.properties there is a commented configuration #advertised.host.name=<Some IP> Uncomment this and add the IP of the Linux Machine in which kafka is running. advertised.host.name=<Kafka Running Machine IP> And connect from clients to <Kafka Running Machine IP> This should fix your issue. EDIT Optionally you can uncomment the...
pipeline,apache-kafka,amazon-redshift,data-partitioning
Assuming: you have multiple interleaved sessions you have some kind of a sessionid to identify and correlate separate events you're free to implement consumer logic absolute ordering of merged events are not important wouldn't it then be possible to use separate topics with the same number of partitions for the...
java,logging,storm,apache-kafka,kafka
Solved this by adding log4j-over-slf4j-1.6.6.jar in my storm library.
The earlier versions of Kafka came with default serializer but that created lot of confusion. With 0.8.2, you would need to pick a serializer yourself from StringSerializer or ByteArraySerializer that comes with API or build your own. The API serializers can be found at StringSerializer: http://kafka.apache.org/082/javadoc/org/apache/kafka/common/serialization/StringSerializer.html ByteArraySerializer: http://kafka.apache.org/082/javadoc/org/apache/kafka/common/serialization/ByteArraySerializer.html So, your...
architecture,messaging,apache-kafka
There is nothing wrong with big messages in kafka. One potential problem is that brokers and consumers have to decompress messages and therefore use their RAM. So if the size is big, it can impose pressure on RAM (but I am not sure what size can give you visible results)....
java,apache,storm,apache-kafka,kafka-consumer-api
You have the problem because your MD5 hash is incorrect: You say that if you convert your bytearray to a java String it works: it is because the value of the MD5 is correct according to a String. collector.emit(tuple, new Values(Obj.hashMD5(key), key)); As you can see the MD5 is calculated...
If you have the choice of using a server capable of writing directly to Kafka (or integrating a producer with your application code) and wouldn't have any other drawbacks, I would definitely do that to avoid the whole log file parsing step. In this case you would connect any analytics...
apache-spark,apache-kafka,spark-streaming
The transformations declared on the StreamHandler will be applied to each batch of the DStream. The current code is quite incomplete to give you a certain answer. In the DStream transformation pipeline you will need an action that materializes the DStream, otherwise nothing will happen. Regarding the approach, a function...
I believe form 0.8.2 they have introduced org.apache.kafka.clients.producer.ProducerRecord<K,V> which takes a topic name an optional partition number, and an optional key and value. You can typically use it like ProducerRecord<String,String> producerRecord = new ProducerRecord<String,String>(topic, key, value); From the doc If a valid partition number is specified that partition will be...
java,apache-kafka,apache-samza
Still not working but I have gone one step ahead. When executing the Samza provided scripts from a path, they expect to be located in a /bin/ folder and they need to have a /lib/ one where all the samza .jar files should be located. I am still having some...
java,osx,apache-kafka,kafka-consumer-api
Try the following; Run ping HQSML-142453 If the ping is not working that means you don't have have the hostname configured in /etc/hosts or in your router DNS. So you must either edit the /etc/hosts and map your HQSML-142453 name to the IP address where your kafka is running or...
java,apache-kafka,kafka-consumer-api
Do have the below code in your Main()? If so, removing it would solve the issue. We hit the same issue as the main thread tries to shutdown the consumer after 10 seconds. You may have to implement gracefully shutting down/starting using Apache Commons Cli and Apache Commons Daemon. try...
Prabhat, In your specific case, a possible explanation of the behavior you observe is you started Kafka with out the auto create topics options set to true. The export process requires Kafka to have this enabled to be able to create topics on the fly. If not you will have...
scala,apache-spark,apache-kafka,spark-streaming
From the documentation of java.lang.AbstractMethod: Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled. This means that there's a version incompatibility between the compile and runtime...
Any of them. Kafka itself is a java process, so to run it you need enough memory to support a JVM and the kafka process itself. You're going to get that with any of the Digital Ocean host sizes. A kafka host generally ends up saturating its network connection before...
Kafka is agnostic to the message content and doesn't provide any special means to enrich it so this is something you need to do yourself. A common way of dealing with these things is to use a structured format such as json, avro or similar where you are free to...
java,hadoop,hbase,zookeeper,apache-kafka
Replacing server.server1 by server.1 and accordingly modifying myid file for each node did the trick.
elasticsearch,apache-kafka,spark-streaming
I found, after many retries, a way to write to ElasticSearch without getting any error. Basically passing the parameter "es.batch.size.entries" -> "1" to the saveToES method solved the problem. I don't understand why using the default or any other batch size leads to the aforementioned error considering that I would...
If I understood you correctly you could use HighwaterMarkOffset from the FetchResponse. This way you will know what is the offset in the end of partition and will be able to compare it with your current acked offset or offset of the last message in this FetchResponse for example. Details...
apache,storm,apache-kafka,flume
I am assuming you are dealing with the use case of Continuous Computation Algorithms or Real Time Analytics. Given below is what you will have to go through if you DO NOT use Kafka or any message queue: (1) You will have to implement functionality like consistency of data. (2)...
apache-kafka,kafka-consumer-api
It is hard to understand your question a little bit. But anyway I suppose you are asking what offset will be when you query the earliest offset. Kafka has log retention configurations that allow to set time to live for a message or log file size. More here. Imagine you...
java,eclipse,jar,apache-kafka,kafka-consumer-api
For completion sake, the reason why I was seeing the error was because I didn't have the jars specified in the classpath. Once, I added the jars to the classpath, it worked perfectly well.
spring-integration,apache-kafka
Please, be more specific. The Spring Integration Kafka Support is just an extension for the Spring Integration, which, in turn, is an extension of the Spring Framework. Since you can simply implement Spring MVC Web application, there is no any stops to provide for it any other integration stuff, like...
Is exists any upper bound of Producers amount per topic? The only limitation I am aware of is the number of available IP addresses. It is unlikely you'd bump into any practical limit in your described application. Does Producer amount impact on Kafka performance? If yes, how? No, all...
java,apache-kafka,kafka-consumer-api
If you look deeper on how message sets and messages are encoded you'll notice that they are usually preceded by a size in bytes (unlike all other structures where the size is an item count), so the client first reads the size of a message set and then reads N...
As per as my understanding goes getting good throughput from Kafka doesn't only depeneds on the cluster size but there are others configurations which needs to be considered as well, I will try to share as much as I can Kafka's throughput is supposed to be linearly scalabale with the...
apache-kafka,kafka-consumer-api
I can't yet speak to the performance comparison with the Zookeeper offset storage, but the high level consumer does support storing offsets in Kafka with 0.8.2. You can configure it by setting the property offsets.storage to kafka. You will also want to set the property dual.commit.enabled to true during the...
cassandra,apache-spark,apache-kafka,spark-streaming
solved the issue. the columnMapper wasnt able to access the getters and setters of class TestTable. So changed the access modifier to public. but now i had 2 public classes in one file. which is an error. so created another java file TestTable.java with class as public class TestTable implements...
java,eclipse,spring-integration,apache-kafka
You're using Java 8 syntax, but Kepler SR2 itself doesn't support Java 8 syntax without the patches discussed here: https://wiki.eclipse.org/JDT/Eclipse_Java_8_Support_For_Kepler . Luna officially supports Java 8.
apache-kafka,distributed-system,kafka
One of the basic assumptions in the design of Kafka is that the brokers in a cluster will, with very few exceptions (e.g. port), have the same configuration as described in this Kafka Improvement Proposal. As a result, the scenario with inconsistent configurations that you have described in your question...
java,apache,amazon-web-services,apache-kafka,java-api
EC2 IP addresses are internal. You may face some issues when dealing with EC2 server running kafka and zookeeper. Try setting advertised.host.name and advertised.port variables in your server.properties file. advertised.host.name should be IP address of the EC2 server. advertised.port should be kafka port. By default it is 9092. ...
java,junit,zookeeper,apache-kafka,jbehave
It turns out this has to do with the Time parameter in the new KafkaServer constructor. I was passing in a null param for the kafka.utils.Time object: private KafkaServer server = new KafkaServer(config, null); Instead, you need to create an implementation of the kafka.utils.Time interface, and pass in a new...
java,zookeeper,apache-kafka,kafka-consumer-api
You could have a look at the Kafka Web Console project which already does something similar to what you describe. If you want to do this yourself, you'd need to use the simple consumer API and manually handle offsets for a new consumer group (stored in Zookeeper or elsewhere). You...
apache-kafka,kafka-consumer-api
One instance of SimpleConsumer reads from a single partition. Though you can easily create multiple instances of SimpleConsumer and read different partitions sequentially or in parallel (from different threads). The tricky part is coordination among readers on different machines so they don't read from the same partition (assuming all messages...
Consumer groups relate to the high level consumer API while the ability to choose broker or partition to consume from relates to the simple consumer API. The high level API will do rebalancing among consumers in a group automatically for you but it will consume all partitions for a given...
transactions,storm,apache-kafka,trident
I believe as long as the messages are not ack they wont be commited as consumed and the spout will replay them when started . On the other hand if you configure your spout to read from the beginning then Kafka-sppout will fetch them from the starting offset point....
java,mongodb,mongodb-query,storm,apache-kafka
Because of mongoDB version I was facing this error. I changed this in my pom.xml and things are working fine. <dependency> <groupId>org.mongodb</groupId> <artifactId>mongo-java-driver</artifactId> <version>3.0.1</version> </dependency> ...
java,scala,apache-kafka,kafka-consumer-api
In your code, AdminUtils.createTopic(zkClient, "pa_reliancepoc_telecom_usageevent", 10, 2, new Properties()); The fourth argument is the replication factor. So you are trying to create a topic with a name of pa_reliancepoc_telecom_usageevent with partition count of 10 and replication of 2. So two kafka brokers should be available while creating the topic. If...
Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. You can go for Kafka if the data will be consumed by multiple applications. Hope this link helps.. it has some info in it http://www.datanami.com/2014/12/02/kafka-run-natively-hadoop/...
java,apache-kafka,kafka-consumer-api
From the kafka documentation The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can...
java,zookeeper,apache-kafka,kafka-consumer-api
From your exception, It says it could not able to connect zookeeper. In your code, ZkClient zkClient = new ZkClient(kafkaHost, 10000, 10000, ZKStringSerializer$.MODULE$); The first argument is the zookeeper host, I don't know what you are having in the kafkaHost variable, As the name implies I think you have stored...
Kafka properties can be found in the following three files server.properties, producer.properties, consumer.properties These files will be available in the folder kafka-folder/config/. By default some properties will be available in those file. You can add what ever the properties you want. The list of properties is given in this link...
Normally your producer takes care of distributing the data to all (or selected set of) nodes that are up and running by using a partitioning function either in a round robin mode or by using some semantics of your choice. The producer publishes to a partition of a topic and...
apache-kafka,kafka-consumer-api
You can use different consumer group names for each client. Every Kafka consumer is part of consumer group. The way it works is that all messages are consumed by every consumer group. But it will be consumed by only one consumer in that group. You can read this for more...
It is not clear from your question, what kind of offset you're interested in. There are actually three types of offsets: The offset of the first available message in topic's partition. Use -2 (earliest) as --time parameter for GetOffsetShell tool The offset of the last available message in topic's partition....
scala,sbt,apache-spark,apache-kafka
It looks like a conflict resolution problem somewhere deep in Ivy. It might be fixed by manually excluding slf4j dependency from Kafka and explicitly adding dependency on latest version: libraryDependencies ++= Seq( "org.apache.kafka" % "kafka_2.10" % "0.8.2.0" excludeAll( ExclusionRule(organization = "org.slf4j") ), "com.typesafe.scala-logging" %% "scala-logging-slf4j" % "2.1.2", "org.slf4j" % "slf4j-api"...
I solve it Set 'advertised.host.name' on server.properties of Kafka broker to server's realIP(same to producer's 'metadata.broker.list' property) refrence : https://issues.apache.org/jira/browse/KAFKA-1092...
According to Kafka's Design : Asynchronous send Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer has an asynchronous mode that accumulates data in memory and sends out larger batches in a single request. The batching can be configured to accumulate no more...
Kafka is a fast, scalable, distributed in nature by its design, partitioned and replicated commit log service.So there is no priority on topic or message. I also faced same problem that you have.Solution is very simple.Create topics in kafka queue,Let say: 1) high_priority_queue 2) medium_priority_queue 3) low_priority_queue Publish high priority...
There was a problem in the way the spout was configured. We have custom properties we use to initialize the SpoutConfig object, we set both forceFromStart and startOffsetTime(to latest) always. The problem was that the property relating to the later was configured with the wrong key and hence sometimes the...
spring,spring-integration,apache-kafka,kafka-consumer-api
Yes, you can do that at runtime. See TopicUtils.ensureTopicCreated. You can add it like a <service-activator> as one more subscriber (the first one) to the <publish-subscribe-channel> for sending messages. Something like this: <publish-subscribe-channel id="sendMessageToKafkaChannel"/> <service-activator input-channel="sendMessageToKafkaChannel" output-channel="nullChannel" order="1" ref="creatTopicService" method="creatTopic"/> <int-kafka:outbound-channel-adapter channel="sendMessageToKafkaChannel"...
zookeeper,apache-kafka,spring-xd
It turns out this was a result of a firewall blocking access between subnets
apache-kafka,kafka-consumer-api
You may use SimpleConsumer to achieve exactly what you are asking - no consumer groups, all consumers can read a single partition. However this approach means you have to handle offset storing and broker failure handling yourself. Another option is to use high level consumer with different consumer groups (you...
scala,akka,apache-kafka,consumer
You can try using something like "Metrics". https://dropwizard.github.io/metrics/3.1.0/manual/ You can define precise metrics, including time and use that inside of your actor.
java,zookeeper,storm,apache-kafka
The problems were caused by the fact the topic had not been instantiated on the Kafka broker.
java,amazon-ec2,apache-kafka,kafka-consumer-api
I suggest you break down the problem. How fast is it without JSon encoding. How fast is one node, without replication vs with replication. Build a picture of how fast each component should be. I also suggest you test bare metal machines to see how they compare as they can...
scala,apache-spark,apache-kafka
Specific to your question: It's not feasible to run Spark Kafka against Scala 2.11 by now (Spark-1.3) General method to build from source code: If no pre-build version available, you could build spark yourself and fulfil your needs by specifying some build parameters. The detailed build procedure could be find:...
Not possible. On the Kafka mailing list I got this suggestion: you could configure the topic to reside on a tmp-fs mount (i.e., in memory filesystem with no disk backing). ...
Well, I just found out the problem. Actually you have two options for Encoder. The DefaultEncoder and the StringEncoder. I was trying to send my id as String and the message as byte array using the default encoder. Looking at the Encoder.scala, you can see the DefaultEncoder implementation and the...
The answer was simple. I was never telling the bolt which stream to subscribe to. Adding .shuffleGrouping("Kafka Spout"); fixed the issue.
By setting TOPOLOGY_ACKER_EXECUTORS to 0, storm will acknowledge all tuples immediately when they come off the spout, which may not be reliable, because no mechanism will work to check if tuple is processed or failed. And by setting setMaxSpoutPending tells storm the maximum number of tuples pending on spout to...
apache-kafka,kafka-consumer-api
Does commitOffsets on the high-level consumer block until offsets are successfully committed? It looks like commitOffsets() loops through each consumer and calls updatePersistentPath if its offset has changed, and if so writes data via zkClient.writeData(path, getBytes(data)). It appears is though commitOffsets() does block until all the offsets are committed....
java,c++,hadoop,apache-kafka,snappy
They are compatible, librdkafka uses the same compression and framing as the Scala/Java client. Increasing fetch.message.max.bytes allows the consumer to fetch larger messages, or larger batches of messages with each request, but it can usually be left to its default value unless your producers are producing messages larger than this...
So to summarize, the solution to this was to add a route via NAT so that the machine can access its own external IP address. Zookeeper uses the address it finds in advertised.host.name both to tell clients where to find the broker as well as to communicate with the broker...
Can you try adding this to the configuration while creating the Consumer group props.put("auto.offset.reset", "smallest"); ...
This issue was resolved last week. The issue was that I was having file write operation for storing the Key of the Hbase record. I was doing it for storing the Key incase of exception. The file write operation is not as fast as single Hbase record read and increased...
apache,apache-kafka,kafka-consumer-api,kafka
When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before...
You don't connect to zookeeper in case of Kafka Producer. You have to connect to the Broker. For that purpose use the following property. props.put("metadata.broker.list", "localhost:9092, broker1:9092"); Here I have used localhost in your case it will be 172.25.37.66...
apache-kafka,kafka-consumer-api
In case of SimpleConsumer, clientName is just an identifier for the client. It is not group id. In fact there is no concept of consumer groups in SimpleConsumer. Please refer to documentation - The main reason to use a SimpleConsumer implementation is you want greater control over partition consumption than...
Another way to do this is by modifying information written in /bin/kafka-server-start.sh: export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" or in /bin/kafka-run-class.sh: KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true" ...
java,php,message-queue,apache-kafka
We also uses kafka java library and we do that like a @apatel says, I think that in your situation you could try to provide some sidecar to your servers with php app, sidecar will create Producer at start and Kafka java driver will manage multiple connections. Here is some...
log4j,slf4j,logstash,apache-kafka
Some things that you can check: is the topic example1 available? If not did you use auto create in kafka? Check for existing topics like this: bin/kafka-topics.sh --list --zookeeper localhost:2181 Example1 should be between the returned items, if not you can create the topic manually as well. bin/kafktopics.sh --create --zookeeper...
multithreading,apache-kafka,kafka-consumer-api
Multiple threads cannot consume the same partition unless those threads are in different consumer groups. Only a single thread will consume the messages from the single partition although you have lots of idle consumers. The number of partitions is the unit of parallelism in Kafka. To make multiple consumers consume...
jms,activemq,message-queue,apache-kafka,apollo
Apache ActiveMQ is a great workhorse full of features and nice stuff. It's not the fastest MQ software around but fast enough for most use cases. Among features are flexible clustring, fail-over, integrations with different application servers, security etc. Apache Apollo is an attempt to write a new core for...
c#,apache-kafka,kafka-consumer-api
The Consumer in Kafka-net does not currently auto track the offsets being consumed. You will have to implement the offset tracking manually. To Store the offset in kafka version 0.8.1: var commit = new OffsetCommitRequest { ConsumerGroup = consumerGroup, OffsetCommits = new List<OffsetCommit> { new OffsetCommit { PartitionId = partitionId,...
java,memory,apache-kafka,kafka-consumer-api
While you can't limit the number of messages, you can limit the number of bytes received per topic-partition per request. However, this should be done as a configuration setting rather than as part of your consumer implementation code. The Kafka consumer config docs say that you can specify a maximum...
asynchronous,apache-kafka,job-scheduling
You should look at Apache Storm for the processing portion of your consumer and leave the message storage and retrieval to Kafka. What you've described is a very common use case in Big Data (although the 50+ minute thing is a bit extreme). In short, you'll have a small number...
First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work. The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing. Now in more detail. Regarding offset tracking, imagine following scenario: you start...
You need to use the Kafka Java API for creating kafka producer (if you intent to use JAVA). Using maven is highly recomended as it will help managing all the dependencies for you. But you are free to bypass maven if you are ready to manage all the required JARs...
apache-kafka,kafka-consumer-api
soTimeout is the time in milliseconds to wait for a connection to the given broker. I don't know that anything special happens with the connection other than you get validation that there's a broker over there that's ready to perform some subsequent actions. I believe that the bufferSize used in...
The problem was that I overlooked a ConsumerTimeoutException that was crashing my Consumer and I mistook this for "the Consumer hanging forever". From the docs on Consumer configuration: By default, this value is -1 and a consumer blocks indefinitely if no new message is available for consumption. I had this...
apache-kafka,google-cloud-dataflow,google-cloud-pubsub
(1) Dataflow doesn't support reading directly from Kafka yet, but we plan to in the future. (2) I'm not quite sure what you mean. Can you provide more details?...
apache-kafka,kafka-consumer-api
Ok so there are 2 rebalancing algorithms at the moment - Range and RoundRobin. They are also called Partition Assignment Strategies. For the simplicity assume we have a topic T1 with 10 partitions and we also have 2 consumers with different configurations (for the example to be clearer) - C1...
Are you running Storm Supervisor? If you deploy a new topology and Supervisor isn't running the topology will show up in the UI but since its never initialized it doesn't show any stats when you click into it.
apache-kafka,kafka-consumer-api,akka-kafka
This looks like your broker advertises itself incorrectly. There's a line in your broker's server.properties: #advertised.host.name=<hostname routable by clients> You should uncomment it and set the value routable by your consumer and restart your broker....
docker,zookeeper,apache-kafka,fig
The configuration that's been working for me without any issues for the last two days involves specifying host addresses for both Zookeeper and Kafka. My fig.yml content is: zookeeper: image: wurstmeister/zookeeper ports: - "xx.xx.x.xxx:2181:2181" kafka: image: wurstmeister/kafka:0.8.2.0 ports: - "9092:9092" links: - zookeeper:zk environment: KAFKA_ADVERTISED_HOST_NAME: xx.xx.x.xxx KAFKA_NUM_REPLICA_FETCHERS: 4 ...other env...
According to the configuration page zookeeper.connect is a property for the Broker and/or the Consumer, not a Producer property, instead you will need to set metadata.broker.list, this is section 3.3 Producer Configs. Hope it helps!...
apache,email,apache-kafka,mailing-list
I have been using http://www.search-hadoop.com/ for many years. Its my go to website for searching mail archives. On the right hand side, you can select which project you want to search for.
Presumably storm will give me auomatic fault tolerance and ease of re-balancing? Yep, it's all about fault tolerance and rebalancing: Storm will keep an eye on most of the components and tracks whether batch was successfully replayed or not. If it's not, it will conveniently replay it. UI and...
EDIT: the new timeout.ms property works with the ack configuration of the producer. For example consider the following situation ack = all timeout.ms = 3000 in this case ack = all means that the leader will not respond untill it receives acknowledgement for the full set of in-sync replicas (ISR)...
If checkpointing is enabled in Spark Streaming, then objects used in a function called in forEachRDD should be Serializable. Otherwise, there will be an "ERROR OneForOneStrategy. The code will run if we turn off checkpointing.
java,scala,apache-kafka,kafka-consumer-api
You need to add your hostname to /etc/hosts: 127.0.0.1 localhost linux-pic4.site See here for deeper explanation: InetAddress.getLocalHost() throws UnknownHostException...
I think you need to define props.put("partitioner.class", "example.producer.SimplePartitioner"); The wiki page says The third property "partitioner.class" defines what class to use to determine which Partition in the Topic the message is to be sent to. This is optional, but for any non-trivial implementation you are going to want to implement...