As a developer, I've created HBase table for our project by importing data from existing MySQL table using
sqoop job. The problem is our data analyst team are familiar with MySQL syntax, implies they can query
HIVE table easily. For them, I need to expose HBase table in HIVE. I don't want to duplicate data by populating data again in HIVE. Also, duplicating data might have consistency issues in future.
Can I expose HBase table in HIVE without duplicating data? If yes, how do I do it? Also, if I
insert/update/delete data in my HBase table will updated data appear in HIVE without any issues?
Sometimes, our data analytic team create table and populate data in HIVE. Can I expose them to HBase? If yes, how?
Best How To :
external table in hive for HBase table allows you to query HBase data o be queried in Hive without the need for duplicating data. You can just update or delete data from HBase table and you can view the modified table in Hive too.
Consider you have an hbase table with columns
Sample external table command for hive:
CREATE EXTERNAL TABLE hivehbasetable(key INT, id INT, username STRING, password STRING, email STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:username,name:password,email:email") TBLPROPERTIES("hbase.table.name" = "hbasetable");
For more information on Hive-Hbase integration look here