Menu
  • HOME
  • TAGS

Java - MySQL to Hive Import where MySQL Running on Windows and Hive Running on Cent OS (Horton Sandbox)

java,mysql,hive,sqoop,hortonworks-data-platform

Yes you can do it via ssh. Horton Sandbox comes with ssh support pre installed. You can execute the sqoop command via ssh client on windows. Or if you want to do it programaticaly (thats what I have done in java) you have to follow this step. Download sshxcute java...

stop during import using sqoop

import,command,sqoop

Use ctrl+C terminate HBase and re-enter hbase will work.

How to I access HBase table in Hive & vice-versa?

hive,hbase,sqoop,apache-sqoop,apache-hive

HBase-Hive Integration: Creating an external table in hive for HBase table allows you to query HBase data o be queried in Hive without the need for duplicating data. You can just update or delete data from HBase table and you can view the modified table in Hive too. Example: Consider...

NoSuchMethodError: JobConf.getCredentials()

java,mysql,hadoop,hive,sqoop

Fortunately I found Answer of my question myself. I have to include a jar file in build path hadoop-0.20.2+737-core.jar instead of hadoop-0.20.2-core.jar. It looks like it is the modified version of the same file containing JobConf Class containing getCredentials() method. Problem is solved but I am still confused about those...

Encountered IOException running import job: java.io.IOException: Error returned by javac

bash,hadoop,jdbc,sqoop

[solved] There was no need to change java_home (or) hadoop_mapred_home (or) sqoop_home As the error suggests, it was not able to find sqoop library. Whereas I used sqoop 1.4.4 library in my program. Please note that I wasn't using any build tool (Maven, SBT or anything) I was using normal...

How to use the Sqoop get result is only one file in hdfs

hdfs,sqoop

You can either merge the files creating a new larger file or you can set the number of mappers to 1 using -m 1 or --num-mappers 1.

Sqoop export error

mysql,hdfs,sqoop

There is a similar bug in sqoop reported here .Please verify the correct MySQL connector version and sqoop version that you are using and update the version as required.Hope this will solve your problem. Thanks.

How to export data from hbase to SQL Server

sql-server,hbase,sqoop

Exporting data from HBase to RDBMS is not supported so far. One work around is you can export HBase data into a file in HDFS and that can be exported into RDBMS. For more information you can check Sqoop1 vs Sqoop2...

Flume and sqoop limitation

performance,sqoop,flume

Using Sqoop to transfer few terabytes from RDBMS to HDFS is a great idea, highly recommended. This is Sqoop's intended use case and it does do reliably. Flume is mostly intended for streaming data, so if the files all have events, and you get new files frequently, then Flume with...

Moving HDFS data into MongoDB

mongodb,hadoop,sqoop

The basic problem is that mongo stores its data in BSON format (binary JSON), while you hdfs data may have different formats (txt, sequence, avro). The easiest thing to do would be to use pig to load your results using this driver: https://github.com/mongodb/mongo-hadoop/tree/master/pig into mongo db. You'll have to map...

Import data from oracle into hive using sqoop - cannot use --hive-partition-key

oracle,hadoop,hive,sqoop

In columns option provide all those columns name which you want to import except the partition column. We don't specify the partition column in --columns option as it get automatically added. Below is the example: I am importing DEMO2 table with columns ID,NAME, COUNTRY. I have not specified COUNTRY column...

Data in HDFS files not seen under hive table

hadoop,hive,sqoop,hadoop-partitioning

Hive default delimiter is ctrlA, if you don't specify any delimiter it will take default delimiter. Add below line in your hive script . row format delimited fields terminated by '\t'...

S3 to local hdfs data transfer using Sqoop

hadoop,sqoop

Sqoop is for getting data from relational databases only at the moment. Try using "distcp" for getting data from S3. The usage is documented here: http://wiki.apache.org/hadoop/AmazonS3 In the section "Running bulk copies in and out of S3"...

sqoop export fails to load data into mysql from hive warehouse folder

mysql,export,hive,bigdata,sqoop

I found out the mistake.. It was bcz of column mismatch from hive and mysql tables. Now it is working good.

schedule and automate sqoop import/export tasks

shell,hadoop,automation,hive,sqoop

Run script : $ ./script.sh 20 //------- for 20th entry [email protected]:~/ramu$ cat script.sh #!/bin/bash PART_ID=$1 TARGET_DIR_ID=$PART_ID echo "PART_ID:" $PART_ID "TARGET_DIR_ID: "$TARGET_DIR_ID sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '$PART_ID' AND rownum < 10001 AND \$CONDITIONS" --target-dir...

Sqoop error while loading data from Hive to MySQL

hadoop,hive,sqoop

The location into which you placed the file does not appear to be correct. For a table "test" you should put a file underneath a directory test. But your command hadoop fs -put test /user/cloudera creates a file called test. You would likely find more success as follows: hadoop fs...

sqoop-export is failing when I have \N as data

hive,sqoop

Try adding these arguments to the export statement --input-null-string "\\\\N" --input-null-non-string "\\\\N" From the documentation: If --input-null-string is not specified, then the string "null" will be interpreted as null for string-type columns. If --input-null-non-string is not specified, then both the string "null" and the empty string will be interpreted as...

In Sqoop, file import, I would like to control the imported data within file splits using defined mappers

java,hadoop,sqoop,apache-sqoop

Range of salary is 5000 - 70000 (i.e. min 5000, max 70000). Salary is divided salary into 4 classes. (70000 - 5000 )/4=16250 Hence, split 1 : from 5000 to 21,250(=5000+16250) split 2 : from 21250 to 37500(=21250+16250) split 3 : from 37500 to 53750(=37500+16250) split 4 : from 53750...

Apache Sqoop connectivity error

mysql,hadoop,jdbc,bigdata,sqoop

Try this instead: sqoop list-databases --connect jdbc:mysql://localhost \ --username root \ --password aaaaaaaa Problem was there is not options --user for Sqoop instead you have to use --username....

Arithmetic operation while joining two tables in a sqoop query

teradata,sqoop

YEAR and MONTH are not valid Teradata SQL, both are ODBC syntax, which is automatically rewritten by the ODBC driver. Try EXTRACT(YEAR FROM A.dt) instead....

Sqoop Export with Missing Data

sql,postgresql,shell,hadoop,sqoop

I solved the problem by changing my reduce function so that if there were not the correct amount of fields to output a certain value and then I was able to use the --input-null-non-string with that value and it worked.

java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/home/hduser/sqoop/lib/hsqldb-1.8.0.10.jar

java,hadoop,sqoop

hdfs://localhost:9000/ is hadoop hdfs address. you can change property in your app or upload your jar on hdfs. You display ls command of your linux file system but hdfs://localhost:9000/ is addres of hadoop hdfs file system....

sqoop importing string column of a dataset conatining “,” in it

sql,hadoop,hive,sqoop

adding this --fields-terminated-by '^' to sqoop import solved similar problem of mine

Sqoop installation export and import commands

hadoop,sqoop

Here is the complete procedure for installation and import and export commands for Sqoop. Hope fully it may be helpful to some one. This one is tried and tested by me and actually works. Download : apache.mirrors.tds.net/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz sudo mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz /usr/lib/sqoop copy paste followingtwo lines in .bashrc export SQOOP_HOME=/usr/lib/sqoop export...

Data import from MySQL with Apache Sqoop - Error : No manager for connect string

java,mysql,hadoop,sqoop

Looks like your jdbc url is incorrect, it should be like jdbc:mysql://localhost/bbdatabank ...

Import from sql server to hbase, retrieved 395809 records from sql using sqoop, but only 365587 rows in hbase

sql-server,import,hbase,sqoop

It's difficult to say, but chances are conversation_id is not unique. For more interactive help on this subject, try the sqoop mailing lists.

Sqoop Import using Hue - using inner query

sqoop,hue

Hue will submit this script through Oozie Sqoop Action. It has a particular way to specify the arguments. Hue also comes with a built in Sqoop example that you could try to modify with your import....

hive-drop-import-delims not removing newline while using HCatalog in Sqoop

oracle,hadoop,hive,sqoop,hcatalog

Use --map-column-java option to explicitly state the column is of type String. Then --hive-drop-import-delims works as expected (to remove \n from data). Changed Sqoop Command : sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \ --username user123 --password passwd123 -table SCHEMA.TBL_2 \ --hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ --split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims...

How does Hadoop run in “real-time” against non-stale data?

java,hadoop,hdfs,real-time,sqoop

With the help of tools Apache Spark streaming API & another one is Storm which you can use for real time stream processing.

How does mapper output get written to HDFS in case of Sqoop?

java,hadoop,mapreduce,hdfs,sqoop

If there are no reducers then mapper output is written to HDFS. Even in this case mapper output is not directly written to HDFS but written on individual node disk and then copied over to HDFS. Sqoop is one scenario where it is typically a map only job wherein you...

How do I import from a MySQL database to Datastax DSE Hive using sqoop?

sqoop,datastax-enterprise,datastax

Sounds like you're trying to do something similar to slide 47 in this deck: http://www.slideshare.net/planetcassandra/escape-from-hadoop The strategy Russell uses there is to use the spark mysql driver, no need to deal with Sqoop. You do have to add the dependency to your spark classpath for it to work. No need...

sqoop import unable to locate sqoop-1.4.6.jar

hadoop,sqoop

Try this: 1. Create a directory in HDFS: hdfs dfs -mkdir /usr/lib/sqoop 2. Copy sqoop jar into HDFS: hdfs dfs -put /usr/lib/sqoop/sqoop-1.4.6.jar /usr/lib/sqoop/ 3. Check whether the file exists in HDFS: hdfs dfs -ls /usr/lib/sqoop 4. Import using sqoop: sqoop import --connect jdbc:mysql://localhost/<dbname> --username root --password [email protected] --table <tablename> -m...

Incremental loads in Sqoop

hadoop,hive,teradata,sqoop

If you use --incremental lastmodified mode then your --check-column is a timestamp that does not need to be numeric or unique. See: Sqoop incremental imports....

HDP 2.2 Sandbox Could not find SQOOP directory

hadoop,sandbox,sqoop,hortonworks-data-platform

It's in /usr/hdp/2.2.0.0-2041/sqoop/lib

How to create external table in Hive using sqoop. Need suggestions

hadoop,hive,sqoop

Sqoop does not support creating Hive external tables. Instead you might: Use the Sqoop codegen command to generate the SQL for creating the Hive internal table that matches your remote RDBMS table (see http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal) Modify the generated SQL to create a Hive external table Execute the modified SQL in Hive...

sqoop-merge can this command use on hbase import?

hbase,sqoop

sqoop-merge doesn't support hbase but running a new import (even from other sql table) will override the data in hbase. You can provide a custom where + custom columns to update just the data you need without affecting the rest of the data already stored in hbase: sqoop import --connect...

ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver

mysql,sqoop

If it is a version incompatibility issue, then you can give a try to mysql-connector-java-5.1.31.jar as I am using mysql-connector-java-5.1.31.jar with sqoop version 1.4.5. For me, it works for both data import and export use cases.

Can Apache Sqoop and Flume be used interchangeably?

hadoop,bigdata,sqoop,flume

Flume and Sqoop are both designed to work with different kind of data sources. Sqoop works with any kind of RDBMS system that supports JDBC connectivity. Flume on the other hand works well with streaming data sources like log data which is being generated continuously in your environment. Specifically, Sqoop...

how to deal with sqoop import delimiter issues \r\n

mysql,oracle,postgresql,hive,sqoop

In the newer versions of sqoop you have the --hive_drop-import-delims or --hive-delims-replacement command. See https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html That will deal with the \r \n and \001 in your string fields. For other replacements your workaround with the REPLACE function is the way to go...

Loading data into Hadoop

hadoop,mapreduce,apache-pig,sqoop,flume

Question 1: Answer: C Explanation: You need to join User profile records and weblogs. Weblogs is already ingested into HDFS.So inorder to join weblogs with user profile,we need to bring user profile also into HDFS.User profile is residing in OLPT database so to import that to HDFS we need...

SQOOP connection-param-file format

hadoop,parameters,connection-string,sqoop

if you want to give your database connection string and credentials then create a file with those details and use --options-file in your sqoop command create a file database.props with the following details: import --connect jdbc:mysql://localhost:5432/test_db --username root --password password then your sqoop import command will look like: sqoop --options-file...

getting error while using sqoop 1.4.5 and hadoop 2.41

java,hadoop,import,sqoop

This type of error usually comes when there is a version conflict,So please make sure your Sqoop version is compatible with your Hadoop distribution. And in case if you are using some third party connector for importing data, it should also be compatible with your sqoop version. I am using...

How to insert and Update simultaneously to PostgreSQL with sqoop command

postgresql,hadoop,hive,sqoop

According to my internet search, it is not possible to perform both insert and update directly to postgreSQL DB. Instead you can create a storedProc/function in postgreSQL and you can send data there.. sqoop export --connect <url> --call <upsert proc> --export-dir /results/bar_data Stored proc/function should perform both Update and Insert....

sqoop unable to import table with dot

hive,sqoop

odoo use underscore delimiter not dot

How to import MySql table into a targeted database in hive?

hadoop,hive,sqoop

Import a MySQL table into Hive: sqoop import --connect jdbc:mysql://localhost:3306/mysqldatabase --table mysqltablename --username mysqlusername --password mysqlpassword --hive-import --hive-table hivedatabase.hivetablename --warehouse-dir /user/hive/warehouse Changes to be made: mysqldatabase -- Your mysql database name from which the table is to be imported to hive. mysqltablename -- Your mysql table name to be imported...

Sqoop Job via Oozie HDP 2.1 not creating job.splitmetainfo

hadoop,mapreduce,sqoop,oozie,hortonworks-data-platform

So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line: hadoop jar... The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of...

Should HBase be installed on the client side? Is sqoop an API? Is Drill an API?

hadoop,hbase,sqoop,apache-spark

I think HBase is not a core component of Hadoop, hence as a client, what should I do? Hbase is not a core component of Hadoop. To use it, you need to install HBase on top of your hadoop cluster. It is dependent on HDFS/Zookeeper. It is not dependent on...

E0701 XML schema error in OOZIE workflow

sqoop,oozie

Confirm if you have copied your workflow.xml to hdfs. You need not copy job.properties to hdfs but have to copy all the other files and libraries to hdfs

Oozie cant able to find JDBC drivers in Sqoop

hadoop,sqoop,oozie,sqoop2

You need to add all lib files like jdbc drivers, etc in the oozie share lib folder inside sqoop folder . This should resolve your issue. To check the library files invoked/used by the job , go to the job tracker for the corresponding job and in syslogs you will...

How can I customize Sqoop Import serialization from Mysql to HBase?

mysql,serialization,import,hbase,sqoop

You can represent 0x1E (up arrow) with CHAR(30) and 0x1F (down arrow) with CHAR(31), therefore, you can provide a free query and perform the replacements. This should achieve exactly what you're looking for: sqoop import --connect jdbc:mysql://localhost:3306/[db] \ --username [user] --password [pwd] \ --query 'SELECT CONCAT(email_address,updated_date) as id, REPLACE(REPLACE(modification,":",CHAR(31),uri),"|",CHAR(30),uri) as...

Accessing Vertica Database through Oozie sqoop

sqoop,oozie,vertica

As the error states: Could not load db driver class: dbDriver There are likely two problems: The JDBC URL is probably incorrect The JDBC Jar needs to be included in the workflow For the JDBC URL, make sure it looks like this: jdbc:vertica://VerticaHost:portNumber/databaseName For the JDBC jar, it needs to...

Sqoop action erroring in oozie workflow

hadoop,sqoop,oozie

Looks like you need to install and configure a Teradata connector for Sqoop. See here: http://www.cloudera.com/content/support/en/downloads/download-components/download-products/downloads-listing/connectors/teradata.html...

Sqoop cannot find mysqldump when using direct import into HDFS

hdfs,mysqldump,sqoop

The mysqldump utility is used with the "direct connector". The reason it cannot be found is that the mysqldump binary is either not on your system or not part of the PATH environment variable when Sqoop is running the MapReduce job. Things that will help: It seems like you're running...