Menu
  • HOME
  • TAGS

Loading data into Hive/Impala

hadoop,hive,oozie,impala

In our case, the incremental data goes into a new partition in Hive table every time. So, in step 3 (in the above mentioned steps), we simply add a new partition to the table. In case of multiple workflows working in parallel, if each of them loads data into a...

Error on running multiple Worlflow in OOZIE-4.1.0

java,hadoop,mapreduce,oozie,oozie-coordinator

The problem is with the Queue, When we running the Job in SAME QUEUE(DEFAULT) with above cluster setup the Resourcemanager is responsible to run mapreduce job in the salve node. Due to lack of resource in slave node the job running in the queue will meet Deadlock situation. In order...

Unable to retrieve a property from oozie.action.conf.xml

java,hadoop,action,config,oozie

The xml file is there. However it couldn't be loaded by loadFromXML(). Try use hadoop configuraion class: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; ... Configuration conf = new Configuration(false); conf.addResource(new Path(System.getProperty("oozie.action.conf.xml"))); String loadType = conf.get("load.type") ... It should work....

Can an Oozie Action point to multiple actions?

oozie

I'm not exactly sure on your question, so let me know if you meant something else. Oozie prevents having multiple actions pointing to one action. The join following a fork is the only exception here. I tend to try and work around this with clever workflow design and good use...

I cannot find Apache Oozie Hadoop Libs tar.gz file in hadooplibs folder

apache,hadoop,oozie

This issue is solved by moving from 4.2.0 to 4.1.0 version. I have already reported this issue to Oozie bug tracking system. https://issues.apache.org/jira/browse/OOZIE-2297...

How to use Oozie action with input path depending on flag

conditional-statements,oozie

Oozie allows you to use EL expressions, which includes the conditional operator ?:. This makes it fairly straight-forward to implement a default path when the specified path does not exist: ${fs:exists(specified_path) ? specified_path : default_path} ...

Oozie not working in global section with subworkflow action

oozie

To pass properties from a workflow to its sub-workflow you need to add the <propagate-configuration/> tag to the <sub-workflow> action. For example: <action name="main-node"> <sub-workflow> <app-path>/user/${wf:user()}/${examplesRoot}/apps/map-reduce/workflow.xml</app-path> <propagate-configuration/> </sub-workflow> <ok to="end" /> <error to="fail" /> </action> ...

Oozie workflow parameter not getting set from coordinator

oozie,hue,oozie-coordinator

In the workflow, replace ${wf:conf(DATE)} by ${DATE} that way it will be parameterized correctly.

Is it possible to use two “job.properties” file in a workflow oozie?

hadoop,workflow,oozie,properties-file

you can remove the Path in the Pig it Self or You can call fs action in the workflow before executing the Pig Action, both will work. for Pig mention rmf command at the beging of the file : rmf in workflow: <action name="prepare"> <fs> <delete path="${pig output}"/> </fs> <ok...

Rest API for Oozie workflow created through HUE

hadoop,oozie,hue

Did not want to raise a question and then answer it. But since I had to do search for sometime, the answer might help someone ... When a Oozie workflow is created using Hue, a workflow xml is created by Hue and placed in an HDFS location. This file can...

oozie coordinator jobs not starting at the given start time

hadoop,oozie,oozie-coordinator

did you cross checked the GMT time to our IST time!! Make sure that the IST time to your jobstart variable.

Oozie coordinator with sysdate as start time

oozie,oozie-coordinator

You can make coorodinator's "start" refer to a variable - startTime, then overwrite its value with sysdate from command line, such as: oozie job -run -config ./coord.properties -DstartTime=`date -u "+%Y-%m-%dT%H:00Z"` adjust the time format if you are not using UTC time zone in your system. sample coordinator job xml: <coordinator-app...

Oozie or Shell Script for Workflow orchestration in Hadoop

hadoop,oozie,cloudera-cdh,hortonworks-data-platform,oozie-coordinator

Generally speaking, oozie has several advantages here: Generate a DAG each time so you can have a direct view on your workflow. Easier access to the log files for each action such as hive, pig, etc. You will have your full history for each run of your task. Better schedule...

Using JobControl in Oozie Java action

hadoop,oozie

Job Control can be used in Oozie and it is natural. Oozie can have a single java action which invokes a sequence of jobs using Job Control. Oozie running as a service in the cluster adds overhead to applications while job control just runs on the client machine.And therefore it...

Oozie example stuck when run pig job

hadoop,apache-pig,oozie

Me too faced same problem while scheduling pig-0.12.1 in oozie-4.0.1 with hadoop-2.2.0. I cant able to schedule pig script using oozie in hadoop-2.2.0 in single node cluster. But I did it in multinode cluster by doing the following changes. NodeManager and Resource Manager running in same system. So i am...

Propogating an Oozie coordinator's run date into the workflow

hadoop,oozie,oozie-coordinator

You will probably find it easier to use the built-in EL function coord:formatTime(String timeStamp, String format) to convert timestamp formats: https://oozie.apache.org/docs/4.1.0/CoordinatorFunctionalSpec.html#a6.9.2._coord:formatTimeString_ts_String_format_EL_Function_since_Oozie_2.3.2 For example: ${coord:formatTime(coord:actualTime(), "yyyyMMdd")} ...

Running Oozie using Cloudera VM issue

hadoop,oozie

In job.properties file, I replaced localhost with: localhost.localdomain. And it fixed the problem

How do I get user jars to take precedence over hadoop jars, for oozie, using hadoop 1.3

hadoop,oozie

If you are using Maven one approach would be to shade the problematic classes within your jar using the Maven Shade Plugin. This will transparently rename the classes within your jar so that they do not clash with different versions of the same classes that are otherwise put on the...

Oozie: propagate-configuration does not work

hadoop,configuration,mapreduce,oozie,orchestration

I ran into the same problem. After some scrambling, the workaround I found working is to add a configuration section within sub-workflow in base xml. The properties inside the config section will be passed down to sub-workflow, e.g.: ... <action name="srv_b"> <sub-workflow> <app-path>a.xml</app-path> <propagate-configuration /> <configuration> <property> <name>paths.prefix.metadata</name> <value>${nameNode}${fimProcessingMetadataPath}</value> </property>...

Oozie job stuck at START action in PREP state

hadoop,oozie

I bet your map-reduce cluster must be running out of slots. Check out how many map slots are configured. Also try to figure out if the service is up on port 8032. You could use the command sudo netstat -netulp | grep 8032. If there is no output returned then...

How to find the exact hadoop jar command which was running my job?

hadoop,yarn,oozie,cascading,scalding

I don't have direct answer to your question but JDiagnostics could help you to recreate the parameters needed, like classpath or environment variables. Here is an example you can put in the beginning of your program before you run it: LOG.info(new DefaultQuery().call()) ...

hive query that will fail if table is empty

hive,oozie

Oozie doesn't fail the action as oozie sees that the hive query has been successfully executed , it doesn't care about anything else A workaround for your case : hive action that loads the table another hive action that checks the count of the table , capture output. use decision...

where does oozie stores the captured output values of the Java action (or) any action

hadoop,oozie

Oozie's java action stores captured output/exported properties in a property file defined by Hadoop Job attribute: oozie.action.output.properties at runtime. When action completed, the data is then serialized to Oozie's backend data store - Mysql or in-memory db, in table - oozie.WF_ACTIONS, column - data. The data here is then visible...

Big Data Analytics using Redshift vs Spark, Oozie Workflow Scheduler with Redshift Analytics

apache-spark,analytics,bigdata,oozie,amazon-redshift

You cannot access data stored on Redshift nodes directly (each via Spark), only via SQL queries submitted the cluster as a whole. My suggestion would be to use Redshift as long as possible and only take on the complexity of Spark/Hadoop when you absolutely need it. If, in the future,...

Oozie job taking longer than scheduled interval

oozie,oozie-coordinator

Oozie won't run the next job before the previous one is over. If the first job takes more than 15 minutes to execute then the next one will be run after scheduled time. So scheduled time and running time may be different in Oozie.

Oozie cant able to find JDBC drivers in Sqoop

hadoop,sqoop,oozie,sqoop2

You need to add all lib files like jdbc drivers, etc in the oozie share lib folder inside sqoop folder . This should resolve your issue. To check the library files invoked/used by the job , go to the job tracker for the corresponding job and in syslogs you will...

How to bring data from external sources (mainly Restful) to HDFS?

java,rest,hadoop,mapreduce,oozie

If all your aim is to bring the data out of a Database to HDFS in the form TSV, then this can be done very easily using the Sqoop tool. Sqoop is a Hadoop ecosystem component, it can directly connect to your rdbms database and can import the records of...

How to pass Hive set parameters in oozie workflow

hadoop,hive,oozie

It depends on the way you invoke hive query(hql) file. If you are using hive action in the workflow, you may specify hive configuration parameter inside property tag in the configuration section or inside the hql file myscript.q <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> ... <action name="myfirsthivejob"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-traker>foo:9001</job-tracker> <name-node>bar:9000</name-node> <prepare> <delete...

Oozie: workflow: How to get the last successful action

hadoop,hdfs,oozie

There's no direct way to get last "successful" action AFAIK. If you think about it outside specific context for a moment: it's not easy to define "success" considering fork/join, control nodes, etc. However, once criteria is defined, I guess it's possible to find last "successful" node using Oozie's REST API....

Accessing and manipulating the date in Oozie

hadoop,oozie

After lots of messing around and research, I've found the following solution. Unlike the other answer it does not require inserting one variable per required date format into the job. My solution is based on using an EL function - basically a UDF but for Oozie. Solution Create an EL...

Not able to run oozie workflow with java action

java,hadoop,oozie,hue

I think you've got some clock skew between some of the hosts in your hadoop cluster. I'm guessing the oozie server and whichever host ran the launcher for your job. Those values look like timestamps in milliseconds since the epoch. And it would make sense for it to be an...

Accessing Vertica Database through Oozie sqoop

sqoop,oozie,vertica

As the error states: Could not load db driver class: dbDriver There are likely two problems: The JDBC URL is probably incorrect The JDBC Jar needs to be included in the workflow For the JDBC URL, make sure it looks like this: jdbc:vertica://VerticaHost:portNumber/databaseName For the JDBC jar, it needs to...

What are the different ways of submitting Hadoop jobs?

command-line-interface,oozie,hadoop2

What does Hadoop jobs mean ? MapReduce : CLI + Oozie Spark : spark-shell + Java / Pyton / Scala + Spark Job Server (Rest) + Oozie Hive : CLI + Java - Thrift(JDBC) + Oozie Pig : CLI + Java + Oozie etc ....... If you want to use...

Oozie coordinator start date set to actual date

date,hadoop,oozie,oozie-coordinator

One idea is to pass the sysdate from shell script to the coordinator job thru command line. See if the answer to a similar question works for you : Oozie coordinator with sysdate as start time...

E0701 XML schema error in OOZIE workflow

sqoop,oozie

Confirm if you have copied your workflow.xml to hdfs. You need not copy job.properties to hdfs but have to copy all the other files and libraries to hdfs

Pig Jobs are stucked when they run concurrently

apache-pig,oozie,hue,cloudera-cdh

This could really be the YARN gotcha #5 of http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ ?

Issue while running Oozie

hadoop,oozie

You need to add these properties in core-site.xml for impersonation in order to solve your whitelist error <property> <name>hadoop.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.oozie.groups</name> <value>*</value> </property> ...

Submit oozie jobs from eclipse from different user?

hadoop,oozie

I tried for Hadoop to execute MapReduce jobs as different user. you may try something similar to this... UserGroupInformation ugi = UserGroupInformation.createRemoteUser("root"); ugi.doAs(new PrivilegedExceptionAction<Void>() { //implement run() method here - generally we submit the job in this block public Void run(){ //submit the job in this block } }); ...

Is the installation of Pig,Hive,Hbase,Oozie,Zookeeper same in Hadoop 2.0 as in Hadoop 1.0?

hadoop,hive,hbase,apache-pig,oozie

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-1/CDH4-Installation-Guide/cdh4ig_topic_16_2.html this may be useful also i think that the configuration isn't different from v 1

How do I access in oozie a custom hadoop counter which is created in a java action, in a subsequent action

hadoop,oozie

The hadoop:* EL functions are only available for Oozie MapReduce actions, so you won't be able to use them for your Java action even though it presumably ran a MapReduce job. Instead, you can use the <capture-output/> tag in your Java action to pass output into the Oozie workflow context....

oozie 4.1.0 louncher fail with OozieLauncherInputFormat$EmptySplit not found

hadoop,oozie,hortonworks-data-platform

Hortonworks Hadoop companion files contain oozie-site.xml property oozie.services with missing entry which enables ShareLibService. Which causes that new Shared Lib feature doesn't work as the endpoint is not registered. To fix this add org.apache.oozie.service.ShareLibService entry to oozie.services list. Be careful as the services are not independent so the order matters!...

Oozie null pointer exception when submitting jobs

cloudera,oozie,oozie-coordinator

Would like to add to the answer of this question. When oozie client got a NullPointerException, that usually means that request caused server thread failure. If you want to find out the "true" reason, you should look at the server log such as /var/log/oozie/oozie.log for CDH. There you will find...

Sqoop Job via Oozie HDP 2.1 not creating job.splitmetainfo

hadoop,mapreduce,sqoop,oozie,hortonworks-data-platform

So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line: hadoop jar... The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of...

Is it possible to run the map reduce job in oozie 4.1.0 with hadoop 2.5.2

java,hadoop,mapreduce,oozie

Probably, it should be memory problem. Set the below properties in yarn-site.xml and try to run job, <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>20960</value> </property>  <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property> ...

Unable to Start Hive Action from Second Run of Oozie Workflow Job

hadoop,hive,oozie

The Issue got resolved once placed the OJDBC Driver in the shared Lib location of HDFS. The said Jar was not available in Oozie Shared Location, where as Derby Jar was available, so Oozie was trying to connect the Derby for Hive-MetaStore. in default....

Oozie invalid user in secure mode

hadoop,kerberos,oozie

I just found the answer in Oozie Authentication Once authentication is performed successfully the received authentication token is cached in the user home directory in the .oozie-auth-token file with owner-only permissions. Subsequent requests reuse the cached token while valid. This is the reason for using invalid user even getting the...

Add Spark to Oozie shared lib

hadoop,apache-spark,oozie

Spark action is scheduled to be released with Oozie 4.2.0, even though the doc seems to be a bit behind. See related JIRA here : Oozie JIRA - Add spark action executor Cloudera's release CDH 5.4 has it already though, see official doc here: CDH 5.4 oozie doc - Oozie...

How to loop in oozie using sub-workflow?

java,hadoop,oozie

After internal discussion we decided to use velocity to produce workflow. So instead of iterating N times, we just generate N nodes. I still don't know if it is possible to iterate or not?...

Shortening Oozie workflows

hadoop,oozie

You can shorten your workflow xml by using Global configurations This way you can put all the join properties in the global section, it look something like this: <workflow-app xmlns="uri:oozie:workflow:0.4" name="wf-name"> <global> <job-tracker>${job-tracker}</job-tracker> <name-node>${namd-node}</name-node> <job-xml>job1.xml</job-xml> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property>...

Hadoop job fails, Resource Manager doesnt recognize AttemptID

hadoop,mapreduce,oozie

Just in case somebody else stubles upon this error: It seemed like this was caused due to hadoop running out of disc space... Pretty cryptic error for something as simple as that. I thought ~90GB would be enough to work on my 30GB Dataset, I was wrong.

Oozie stack trace

hadoop,apache-spark,oozie

To get to a java action log, you could use oozie's web console to find the hadoop job Id of that action. And then use Hadoop's Yarn WebUI to look into that hadoop job's mapper log. With command line interface, the above steps would be: Run oozie cmd to get...

Oozie on YARN - oozie is not allowed to impersonate hadoop

hadoop,yarn,oozie,ambari

Hi please update the core-site.xml <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> and jobTracker address is the Resourcemananger address that will not be the case . once update the core-site.xml file it will works....

HDP 2.0 Oozie Error: E0803 : E0803: IO error, E0603

hadoop,oozie,hortonworks-data-platform

Wrong user used during the installation process. This solved the problem: sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile /usr/lib/oozie/oozie.sql -run Instead of: sudo /usr/lib/oozie/bin/ooziedb.sh create -sqlfile /usr/lib/oozie/oozie.sql -run...

Writing MapReduce job to concurrently download files?

hadoop,oozie,cloudera-cdh

After looking deeper into this, it seems that creating an Oozie "Fork" node would be the best approach. So I created a fork node, under which I created 6 shell actions that executes download.sh and take the list of file numbers as an argument. So I ended up modifying the...

oozie 4.0.1 build error

maven,oozie

This error signals that the JVM running Maven has run out of memory. It is caused by maven-compiler-plugin. The best solution to overcome this issue is, edit maven compiler plugin in pom.xml as below. <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.3.2</version> <configuration> <verbose>true</verbose> <fork>true</fork> </configuration> </plugin> fork allows running the compiler in a...

Sqoop action erroring in oozie workflow

hadoop,sqoop,oozie

Looks like you need to install and configure a Teradata connector for Sqoop. See here: http://www.cloudera.com/content/support/en/downloads/download-components/download-products/downloads-listing/connectors/teradata.html...