You need 2 files in nutch/conf: gora.properties: where you declare you are going to use mongodb backend. gora-mongodb-mapping.xml (notice the dash, not the dot you wrote): where you create a mapping between names in Gora entities and the fields in the datastore. The version you are using I really think...
If you don't set the start nor the end key, it will retrieve all the table like the select you talk about -at least with HBase-.
Know that in Nutch 2.2.1 you have to change the storage in nutch/conf/gora.properties. About the testing error, you can do mvn package -DskipTests....
At this moment SQL module of Gora is disabled because of some issues. It does not meet your needs :( Stand by... in future versions will be enabled again. Anyway, some explanation about Gora: Gora is an Object Mapping (not specifically Relational). We can say it is focused on NoSQL...
According to [WARNING] Unable to autodetect 'javac' path, using 'javac' from the environment. If you're running Maven from Eclipse (e.g. Run As : Maven Install), make sure your environment is configured with correct JRE (you'll need JDK, not JRE). Go to Window -> Preferences -> Java -> Installed JRE. Select...
The solution is to apply the following patch: https://issues.apache.org/jira/browse/NUTCH-1946 to your project. This patch updates gora to 0.6, which contains the fix for this problem. If you run into a RuntimeException during the GeneratorJob, please add the following to your nutch-site.xml <property> <name>io.serializations</name> <value>org.apache.hadoop.io.serializer.WritableSerialization</value> <description>A list of serialization classes that...
you should go to $NUTCH_HOME/runtime/deploy to run the command