Structure of bag:
emp = LOAD '...../emp.csv' using PigStorage(',') AS
This bag contains details of employees. I want to split the data based on job.
Bag = split emp into mngr if job == 'MANAGER';
This is not working & giving
If I include one more condition with it,
for ex.- sal10k if sal<10000, then it is working. But why not only on one
I am new to hadoop pig. Know few basics. Kindly help.
Best How To :
Kindly find the solution to the problem below along with basic explanation about SPLIT operator:
- The SPLIT operator is used to break a relation into two new relations. So you need to take care of both conditions , like IF and ELSE: For instance: IF(Something matches) then make Relation1, IF(NOT(something matches) then make another relation. ( You don't have else keyword in Pig).
- SPLIT operation is an independent operation, meaning that you cant store the SPLIT operation in a relation:
Example: Bag = split emp into mngr if job == 'MANAGER'; // This is wrong.
You can't represent a SPLIT operation by a relation. It will execute independently on the GRUNT shell or Script like this :
*SPLIT emp INTO managers IF(job MATCHES '.MANAGER.'),not_managers IF(NOT(job MATCHES '.MANAGER.'));*
Here is an example data set and output for your reference: **
emp = LOAD 'stackfile.txt' USING PigStorage(',') AS (ename:chararray,id:int,job:chararray,sal:double);
SPLIT emp INTO managers IF(job MATCHES '.*MANAGER.*'),not_managers IF(NOT(job MATCHES '.*MANAGER.*'));