Menu
  • HOME
  • TAGS

Oozie date time start

cloudera,hue,restfb,flume-ng,oozie-coordinator

What about to create a java action, and setup a Workflow property what uses the coordinator's current time. <property> <name>myStart</name> <value>${coord:current(0)}</value> </property> Than use this property in your action as parameter....

Spark stream unable to read files created from flume in hdfs

hadoop,apache-spark,hdfs,spark-streaming,flume-ng

You have detected the problem yourself: while the stream of data continues, the HDFS file is "locked" and can not be read by any other process. On the contrary, as you have experienced, if you put a batch of data (that's yur file, a batch, not a stream), once it...

flume : find ip/hostname of event sender?

flume,flume-ng

For anyone who has similar problem , I ended up removing log4j-flume appender at application side , replaced it with log4j-syslog appender. on flume side I configured syslogudp source. so finally it looks something like, at application side log4j-config log4j.appender.syslog=org.apache.log4j.net.SyslogAppender log4j.appender.syslog.Facility=LOCAL7 log4j.appender.syslog.FacilityPrinting=false log4j.appender.syslog.Header=true log4j.appender.syslog.SyslogHost=flume1.host.net:41473 log4j.appender.syslog.layout=org.apache.log4j.PatternLayout...

Loading csv file into HDFS using Flume (spool directory as source)

hadoop,hadoop-streaming,flume,hortonworks-data-platform,flume-ng

i was using hortonworks sandbox v2.2 , after long time of debugging, i found out there's some conflicts between spark version i installed manually "v1.2" and hortonworks sandbox libraries,so i decided to use cloudera quickstart 5.3.0 and now everything working fine

Move whole file into HDFS as single file using flume spooling directory

flume-ng

Well, sort of. You need to tweak you configuration to reflect that, because flume wasn't designed to shove entire files regardless of their size, as you can more effectively use hadoop fs -copyFromLocal to do that. Here's a list of things you need to configure: a) batch channel size must...

I need a Cassandra Flume Sink

cassandra,nosql,flume,flume-ng

Have you looked at stratio-ingestion ? Stratio-ingestion is a flume distribution with some additional source/sinks/morphlines and fix some official bugs. It also has a Cassandra Sink which you can easily inserts your flume events with different configurations....

How to use flume for uploading zip files to hdfs sink

flume,flume-ng

Flume will try to read your files line by line, except if you configure a specific deserializer. A deserializer lets you control how the file is parsed and split into events. You could of course follow the example of the blob deserizalizer, which is designed for PDFs and such, but...

Flume agent does not stop retrying for unrecoverable solr error

solr,flume,avro,flume-ng

You should fix the reason for the failed requests. Flume is doing exactly what it's designed to do. It's transactionally trying to store the batch of events in your store. If it can't store those events then, yes, it keeps on trying. You haven't explained what the problem is causing...

What causes flume with GCS sink to throw a OutOfMemoryException

docker,google-cloud-storage,flume-ng,google-hadoop

When uploading files, the GCS Hadoop FileSystem implementation sets aside a fairly large (64MB) write buffer per FSDataOutputStream (file open for write). This can be changed by setting "fs.gs.io.buffersize.write" to a smaller value, in bytes, in core-site.xml. I imagine 1MB would suffice for low-volume log collection. In addition, check what...

How to use Flume executing pre-process on source and keeping real filename in hdfs sink

hadoop,hdfs,flume,flume-ng

The original file name can be presereved as a a header if you specify agent.sources.seqGenSrc.fileHeader=true That can then be retrieved in your sink. if you want to manipulate the data within your files, use an interceptor. You should be aware that an event is basically a line within a file...

Flume - Can an entire file be considered an event in Flume?

hadoop,flume,flume-ng

For starters, flume doesn't work on files as such, but on a thing called events. Events are Avro structures which can contain anything, usually a line, but in your case it might be an entire file. An interceptor gives you the ability to extract information from your event and add...

Flume: kafka channel and hdfs sink get unable to deliver event error

hadoop,hdfs,apache-kafka,flume,flume-ng

Based on the answer from the community, this question can be solved by following two JIRA topic. https://issues.apache.org/jira/browse/FLUME-2734 https://issues.apache.org/jira/browse/FLUME-2735...

Impala create external table, stored by Hive

twitter,hbase,flume,impala,flume-ng

Well, it seems that Impala still not support the SerDe (serialization/deserialisation). "You create the tables on the Impala side using the Hive shell, because the Impala CREATE TABLE statement currently does not support custom SerDes and some other syntax needed for these tables: You designate it as an HBase table...

Node.js to Flume-NG

node.js,thrift,flume,avro,flume-ng

This is the Setup of the Thrift Server implemented in FLUME-1894, file ThriftSource.java: args.protocolFactory(new TCompactProtocol.Factory()); args.inputTransportFactory(new TFastFramedTransport.Factory()); args.outputTransportFactory(new TFastFramedTransport.Factory()); args.processor(new ThriftSourceProtocol.Processor<ThriftSourceHandler>(new ThriftSourceHandler())); To get it to work, you need to use compatible stack on the client side: compact protocol fast framed Transport There is already a client implentation in...

Duplicate channel before being intercepted by interceptor

flume,flume-ng

Since interceptors are configured per source, you will have to add a second source (configured with no interceptors at all and listening in a different Http port), and emit your data twice: one copy for the source with interceptors, and one copy to the other source. Another possibility is to...

How to set log filename in flume

apache,logging,flume,flume-ng,flume-twitter

Having a look on the documentation it seems there is no parameter for configuring the name of the files that are going to be created. I've gone to the sources looking for some hidden parameter, but there is no one :) Going into the details of the implementation, it seems...

Expected timestamp in the Flume event headers, but it was null

flume,flume-ng,flume-twitter

In twitter.conf added one more config property as TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true and issue got resolved. For more details Refer Hadoop tutorial.info...

Flume 1.6 kafka source

flume-ng

You can solve this error by providing the flume-env.sh file with the path to a zookeeper jar. After that just restart your agent and you should see the info from your Kafka topic flowing. FLUME_CLASSPATH="/path/to/hadoop-2.5.0-cdh5.3.1/share/hadoop/common/lib/zookeeper-3.4.5-cdh5.3.1.jar" ...

Flume to HDFS split a file to lots of files

hadoop,hdfs,flume,flume-ng

You need to disable the rolltimeout too, that's done with the following settings: tier1.sinks.hdfs-sink.hdfs.rollCount = 0 tier1.sinks.hdfs-sink.hdfs.rollInterval = 300 rollcount prevents roll overs, rollIntervall here is set to 300 seconds, setting that to 0 will disable timeouts. You will have to chosse which mechanism you want for rollovers, otherwise Flume...

Flume-ng hdfs sink .tmp file refresh rate control proprty

cloudera,flume,hortonworks-data-platform,flume-ng,flume-twitter

Consider decreasing your channel's capacity and transactionCapacity settings: capacity 100 The maximum number of events stored in the channel transactionCapacity 100 The maximum number of events the channel will take from a source or give to a sink per transaction These settings are responsible for controlling how many events get...

Apache Flume /var/log/flume-ng/flume.log (Permission denied)

java,hadoop,flume,flume-ng

Check the file permissions for /var/log/flume-ng/flume.log and change it. sudo chown flume /var/log/flume-ng/flume.log [java.io.FileNotFoundException: /var/log/flume-ng/flume.log (Permission denied)]...