hadoop,hortonworks-data-platform,ganglia
It turns out to be a proxy issue, to access the internet I had to add my proxy details to the file /var/lib/ambari-server/ambari-env.sh export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms512m -Xmx2048m -Dhttp.proxyHost=theproxy -Dhttp.proxyPort=80 -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false' When ganglia was trying to access each node in the cluster the request was going via the proxy...
hadoop,hdfs,bigdata,hortonworks-data-platform,disaster-recovery
First, if you want a DR solution you need to store this data somewhere outside of the main production site. This implies that the secondary site should have at least the same storage capacity as the main one. Now remember that the main ideas that lead to HDFS were moving...
java,mysql,hive,sqoop,hortonworks-data-platform
Yes you can do it via ssh. Horton Sandbox comes with ssh support pre installed. You can execute the sqoop command via ssh client on windows. Or if you want to do it programaticaly (thats what I have done in java) you have to follow this step. Download sshxcute java...
hortonworks-data-platform,sqoop2
Try this : sqoop import --connect "jdbc:sqlserver://192.168.40.133:1434;database=AdventureWorksLT2012;username=test;password=test" --table ProductModel --hive-import -- --schema SalesLT --incremental append --check-column ProductModelID --last-value "128" ...
hadoop,zookeeper,hortonworks-data-platform,giraph
Yes, you can specify each time you run a Giraph job via using option -Dgiraph.zkList=localhost:2181 Also you can set it up in Hadoop configs and then you don't have to pass on this option each time you submit a Giraph job. For that add the following line in conf/core-site.xml file...
sandbox,hortonworks-data-platform,hcatalog
I suspect this is due to memory. Your memory should be at least 4096 MB.
linux,hadoop,hortonworks-data-platform
What do you have specified as the hostname in /etc/ambari-agent/conf/ambari-agent.ini ? I assume that it is 'namenode.localdomain.com'
hadoop,mapreduce,hive,yarn,hortonworks-data-platform
A simple count operation involves a map reduce job at the back end. And that involves 10 million rows in your case. Look here for a better explanation. Well this is just for the things happening at the background and execution time and not your question regarding memory requirements. Atleast,...
hadoop,bigdata,hortonworks-data-platform,ambari,hortonworks
Moving NameNode to same network with DataNodes solved the problem. DataNodes are in 192.1.5.* network. NameNode was in 192.1.4.* network. After moving NameNode to 192.1.5.* did the trick for my case....
google-compute-engine,hortonworks-data-platform,google-hadoop
bdutil in fact is designed to support custom extensions; you can certainly edit an existing one for an easy way to get started, but the recommended best-practice is to create your own "_env.sh" extension which can be mixed in with other bdutil extensions if necessary. This way you can more...
java,hadoop,hortonworks-data-platform,hortonworks
First delete all contents from hdfs folder: Value of hadoop.tmp.dir rm -rf /grid/hadoop/hdfs Make sure that dir has right owner and permission (Username according to your system) sudo chown hduser:hadoop -R /grid/hadoop/hdfs sudo chmod 777 -R /grid/hadoop/hdfs Format the namenode: hadoop namenode -format Try this: sudo chown -R hdfs:hadoop /grid/hadoop/hdfs/dn...
hadoop,bigdata,hortonworks-data-platform,ambari,hortonworks
Use a FQHN (fully-qualified-hostname) on both ambari server and all its client nodes.
cloudera,flume,hortonworks-data-platform,flume-ng,flume-twitter
Consider decreasing your channel's capacity and transactionCapacity settings: capacity 100 The maximum number of events stored in the channel transactionCapacity 100 The maximum number of events the channel will take from a source or give to a sink per transaction These settings are responsible for controlling how many events get...
hadoop,hadoop2,hortonworks-data-platform
I am not certain whether the examples were still shipped with HDP. You would want to search for hadoop-examples\*.jar. If you were unable to locate the examples jar then you might need to download it. Unfortunately it appears the version 2 hadoop does not have maven for it?? http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-examples ...
hortonworks-data-platform,ambari
This great blog post helped us: http://www.swiss-scalability.com/2015/01/rename-host-in-ambari-170.html Basically you will need to log into Ambari's database. (Not the GUI, the actual backend database). It's best to read the blog post in its entirety, but I am appending the important secret sauce that actually makes things happen. If you're on mysql:...
hadoop,hadoop2,hortonworks-data-platform
For formatting the NameNode, you can use the following command run as the 'hdfs' admin user: /usr/bin/hdfs namenode -format For starting up the NameNode daemon, use the hadoop-daemon.sh script: /usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode "-config $HADOOP_CONF_DIR" is an optional parameter here in case you want to reference a specific Hadoop configuration directory....
tableau,hortonworks-data-platform,tableau-server
I got an answer,First we will need to go to Hive and enter the Hive query Query:grant SELECT on table "table-name" to user hue; where “table” is the table you want user “hue” to view....
hadoop,sandbox,sqoop,hortonworks-data-platform
It's in /usr/hdp/2.2.0.0-2041/sqoop/lib
hadoop,mapreduce,sqoop,oozie,hortonworks-data-platform
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line: hadoop jar... The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of...
hive,kerberos,hortonworks-data-platform,hue
Okay, found it (had to debug the full python stack to understand). It's not really advertised, but some hue.ini parameter names have changed: beeswax_server_host --> hive_server_host beeswax_server_port --> hive_server_port It was defaulting hive_server_host to localhost, which is not correct on a secure cluster....
hadoop,oozie,cloudera-cdh,hortonworks-data-platform,oozie-coordinator
Generally speaking, oozie has several advantages here: Generate a DAG each time so you can have a direct view on your workflow. Easier access to the log files for each action such as hive, pig, etc. You will have your full history for each run of your task. Better schedule...
hadoop,hadoop2,hortonworks-data-platform
hdfs dfs -get /hdfs/path /local/path hdfs dfs -put /local/path /hdfs/path...
hadoop,hortonworks-data-platform,ambari
Partually fixed: it is necessary to stop all the HDFS services (Journal Node, Namenodes and Datanodes) before editing the hdfs-site.xml file. Then, of course, Ambari "start button" cannot be used because the configuration would be smashed... thus it is necessary to re-start all the services manually. This is not the...
After spending plenty of time, I assumed that despite of having Internet connectivity, the local repositories will be needed. I installed Apache server and made my repositories accessible as per the documentation. Then, in ‘Advanced Repository Options’, replaced the web url with the local repository URL and it registered the...
hadoop,oozie,hortonworks-data-platform
Hortonworks Hadoop companion files contain oozie-site.xml property oozie.services with missing entry which enables ShareLibService. Which causes that new Shared Lib feature doesn't work as the endpoint is not registered. To fix this add org.apache.oozie.service.ShareLibService entry to oozie.services list. Be careful as the services are not independent so the order matters!...
hadoop,hadoop-streaming,flume,hortonworks-data-platform,flume-ng
i was using hortonworks sandbox v2.2 , after long time of debugging, i found out there's some conflicts between spark version i installed manually "v1.2" and hortonworks sandbox libraries,so i decided to use cloudera quickstart 5.3.0 and now everything working fine
apache,hadoop,hdfs,bigdata,hortonworks-data-platform
Apache Falcon simplifies the configuration of data motion with: replication; lifecycle management; lineage and traceability. This provides data governance consistency across Hadoop components. Falcon replication is asynchronous with delta changes. Recovery is done by running a process and swapping the source and target. Data loss – Delta data may...
oracle,hadoop,hive,hiveql,hortonworks-data-platform
Hive doesn't have the feature of unique identifier of each row (rowid). But if you don't have any primary key or unique key values, you can use the analytical function row_number.
hadoop,admin,hortonworks-data-platform
Issue is fixed. There was a permission issue with user folder i.e. /home/hduser. Somehow permission got changed.
hadoop,hive,hortonworks-data-platform,ambari
See if this JIRA helps with the issue you are hitting... https://issues.apache.org/jira/browse/AMBARI-10360...
java,hadoop,hdfs,hortonworks-data-platform
Check the value of property fs.defaultFS in core-site.xml this contains the ip-address/hostname and port on which NameNode daemon should bind to when it start's up. I see that you are using hortonworks sandbox, here is the property in core-site.xml and its located in /etc/hadoop/conf/core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://sandbox.hortonworks.com:8020</value> </property> So, you...
hadoop,oozie,hortonworks-data-platform
Wrong user used during the installation process. This solved the problem: sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile /usr/lib/oozie/oozie.sql -run Instead of: sudo /usr/lib/oozie/bin/ooziedb.sh create -sqlfile /usr/lib/oozie/oozie.sql -run...
hadoop,hive,hbase,hortonworks-data-platform
Do you have a table already created in Hbase ? You will first have to create a table in Hbase with 'd' as a column family and then you can import this tsv file into that table.
hadoop,hive,yarn,hortonworks-data-platform
I have been using Hortonworks, and you need to add the file/jar within the same session - as you have discovered.
hadoop,bigdata,mahout,hortonworks-data-platform
Hello fellow Hortonworker! Mahout is in the HDP repositories, but it's not available in the ambari install wizard (i.e. Services->Add Service). Therefore the only way to install it is via: yum install mahout As noted here, you should only install it on the master node. Also note that Mahout is...
apache,hadoop,hdfs,hortonworks-data-platform,database-mirroring
(Disclaimer: I work at WANdisco.) My view is that the products are complementary. Falcon does a lot of things besides data transfer, like setting up data workflow stages. WANdisco's products do active-active data replication (which means that data can be used equivalently from both the source and target clusters). In...