Menu
  • HOME
  • TAGS

Getting many memory errors when try to run it for few days in my web crawler [on hold]

Tag: java,memory,memory-management,out-of-memory

I am developing a web crawler application. When i run the program i am getting these error messages below:

enter image description here

i've got these errors after running the program for more that 3 hours. I tried to allocate memory by changing eclipse.ini setting to 2048 MB of ram as it was answered in this topic but still get the same errors after 3 hours or less. I should run the program for more that 2-3 days non-stopping to get analyse the results.

Can you tell me what i am missing here to get these error below ?

These are my classes:

seeds.txt

http://www.stanford.edu
http://www.archive.org

WebCrawler.java

 package pkg.crawler;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.MalformedURLException;
import java.net.SocketTimeoutException;
import java.util.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.TimeUnit;

import org.jsoup.HttpStatusException;
import org.jsoup.UnsupportedMimeTypeException;
import org.joda.time.DateTime;


public class WebCrawler {

public static Queue <LinkNodeLight> queue = new PriorityBlockingQueue <> (); // priority queue
public static final int n_threads = 5;                                 // amount of threads
private static Set<String> processed = new LinkedHashSet <> ();         // set of processed urls
private PrintWriter out;                                                // output file
private PrintWriter err;                                                // error file
private static Integer cntIntra = new Integer (0);                              // counters for intra- links in the queue
private static Integer cntInter = new Integer (0);                              // counters for inter- links in the queue
private static Integer dub = new Integer (0);                                   // amount of skipped urls

public static void main(String[] args) throws Exception {
    System.out.println("Running web crawler: " + new Date());

    WebCrawler webCrawler = new WebCrawler();
    webCrawler.createFiles();
    try (Scanner in = new Scanner(new File ("seeds.txt"))) {
        while (in.hasNext()) {
            webCrawler.enque(new LinkNode (in.nextLine().trim()));
        }
    } catch (IOException e) {
        e.printStackTrace();
        return;
    }
    webCrawler.processQueue();
    webCrawler.out.close();
    webCrawler.err.close();
}

public void processQueue(){
    /* run in threads */
    Runnable r = new Runnable() {
        @Override 
        public void run() {
            /* queue may be empty but process is not finished, that's why we need to check if any links are being processed */
            while (true) {
                LinkNode link = deque();
                if (link == null)
                    continue;
                link.setStartTime(new DateTime());
                boolean process = processLink(link);
                link.setEndTime(new DateTime());
                if (!process)
                    continue;
                /* print the data to the csv file */
                if (link.getStatus() != null && link.getStatus().equals(LinkNodeStatus.OK)) {
                    synchronized(out) {
                        out.println(getOutputLine(link));
                        out.flush();
                    }
                } else {
                    synchronized(err) {
                        err.println(getOutputLine(link));
                        err.flush();
                    }
                }
            }
        }
    };
    /* run n_threads threads which perform dequeue and process */
    LinkedList <Thread> threads = new LinkedList <> ();
    for (int i = 0; i < n_threads; i++) {
        threads.add(new Thread(r));
        threads.getLast().start();
    }
    for (Thread thread : threads) {
        try {
            thread.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}


/* returns true if link was actually processed */
private boolean processLink(LinkNode inputLink) {
    String url = getUrlGeneralForm(inputLink);
    boolean process = true;
    synchronized (processed) {
        if (processed.contains(url)) {
            process = false;
            synchronized (dub) {dub++;}
        } else
            processed.add(url);
    }
    /* start processing only if the url have not been processed yet or not being processed */
    if (process) {
        System.out.println("Processing url " + url);
        List<LinkNodeLight> outputLinks = parseAndWieghtResults(inputLink);
        for (LinkNodeLight outputLink : outputLinks) {
            String getUrlGeneralForumOutput = getUrlGeneralForm(outputLink);
            /* add the new link to the queue only if it has not been processed yet */
            process = true;
            synchronized (processed) {
                if (processed.contains(getUrlGeneralForumOutput)) {
                    process = false;
                    synchronized (dub) {dub++;}
                }
            }
            if (process) {
                enque(outputLink);
            }
        }
        return true;
    }
    return false;
}

void enque(LinkNodeLight link){
    link.setEnqueTime(new DateTime());
    /* the add method requires implicit priority */
    synchronized (queue) {
        if (link.interLinks)
            synchronized (cntInter) {cntInter++;}
        else
            synchronized (cntIntra) {cntIntra++;}
      //queue.add(link, 100 - (int)(link.getWeight() * 100.f));
        queue.add(link);
    }
}


/**
 * Picks an element from the queue
 * @return top element from the queue or null if the queue is empty
 */
LinkNode deque(){
    /* link must be checked */
    LinkNode link = null;
    synchronized (queue) {
        link = (LinkNode) queue.poll();
        if (link != null) {
            link.setDequeTime(new DateTime());
            if (link.isInterLinks())
                synchronized (cntInter) {cntInter--;}
            else
                synchronized (cntIntra) {cntIntra--;}
        }
    }
    return link;
}

private void createFiles() {
    /* create output file */
    try {
        out = new PrintWriter(new BufferedWriter(new FileWriter("CrawledURLS.csv", false)));
        out.println(generateHeaderFile());
    } catch (IOException e) {
        System.err.println(e);
    }
    /* create error file */
    try {
        err = new PrintWriter(new BufferedWriter(new FileWriter("CrawledURLSERROR.csv", false)));
        err.println(generateHeaderFile());
    } catch (IOException e) {
        System.err.println(e);
    }
}
/**
 * formats the string so it can be valid entry in csv file
 * @param s
 * @return
 */
private static String format(String s) {
    // replace " by ""
    String ret = s.replaceAll("\"", "\"\"");
    // put string into quotes
    return "\"" + ret + "\"";
}
/**
 * Creates the line that needs to be written in the outputfile
 * @param link
 * @return
 */
public static String getOutputLine(LinkNode link){
    StringBuilder builder = new StringBuilder();
    builder.append(link.getParentLink()!=null ? format(link.getParentLink().getUrl()) : "");
    builder.append(",");
    builder.append(link.getParentLink()!=null ? link.getParentLink().getIpAdress() : "");
    builder.append(",");
    builder.append(link.getParentLink()!=null ? link.getParentLink().linkProcessingDuration() : "");
    builder.append(",");
    builder.append(format(link.getUrl()));
    builder.append(",");
    builder.append(link.getDomain());
    builder.append(",");
    builder.append(link.isInterLinks());
    builder.append(",");
    builder.append(Util.formatDate(link.getEnqueTime()));
    builder.append(",");
    builder.append(Util.formatDate(link.getDequeTime()));
    builder.append(",");
    builder.append(link.waitingInQueue());
    builder.append(",");
    builder.append(queue.size());
    /* Inter and intra links in queue */
    builder.append(",");
    builder.append(cntIntra.toString());
    builder.append(",");
    builder.append(cntInter.toString());
    builder.append(",");
    builder.append(dub);
    builder.append(",");
    builder.append(new Date ());
    /* URL size*/
    builder.append(",");
    builder.append(link.getSize());
    /* HTML file
    builder.append(",");
    builder.append(link.getFileName());*/
    /* add HTTP error */
    builder.append(",");
    if (link.getParseException() != null) {
        if (link.getParseException() instanceof HttpStatusException)
            builder.append(((HttpStatusException) link.getParseException()).getStatusCode());
        if (link.getParseException() instanceof SocketTimeoutException)
            builder.append("Time out");
        if (link.getParseException() instanceof MalformedURLException)
            builder.append("URL is not valid");
        if (link.getParseException() instanceof UnsupportedMimeTypeException)
            builder.append("Unsupported mime type: " + ((UnsupportedMimeTypeException)link.getParseException()).getMimeType());
    }
    return builder.toString();

}

/**
 * generates the Header for the file
 * @param link
 * @return
 */
private String generateHeaderFile(){
    StringBuilder builder = new StringBuilder();
    builder.append("Seed URL");
    builder.append(",");
    builder.append("Seed IP");
    builder.append(",");
    builder.append("Process Duration");
    builder.append(",");
    builder.append("Link URL");
    builder.append(",");
    builder.append("Link domain");
    builder.append(",");
    builder.append("Link IP");
    builder.append(",");
    builder.append("Enque Time");
    builder.append(",");
    builder.append("Deque Time");
    builder.append(",");
    builder.append("Waiting in the Queue");
    builder.append(",");
    builder.append("QueueSize");
    builder.append(",");
    builder.append("Intra in queue");
    builder.append(",");
    builder.append("Inter in queue");
    builder.append(",");
    builder.append("Dublications skipped");
    /* time was printed, but no header was */
    builder.append(",");
    builder.append("Time");
    /* URL size*/
    builder.append(",");
    builder.append("Size bytes");
    /* HTTP errors */
    builder.append(",");
    builder.append("HTTP error");
    return builder.toString();

}



String getUrlGeneralForm(LinkNodeLight link){
    String url = link.getUrl();
    if (url.endsWith("/")){
        url = url.substring(0, url.length() - 1);
    }
    return url;
}


private List<LinkNodeLight> parseAndWieghtResults(LinkNode inputLink) {
    List<LinkNodeLight> outputLinks = HTMLParser.parse(inputLink);
    if (inputLink.hasParseException()) {
        return outputLinks;
    } else {
        return URLWeight.weight(inputLink, outputLinks);
    }
}
}

HTMLParser.java

package pkg.crawler;

import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Writer;
import java.math.BigInteger;
import java.util.Formatter;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.logging.Logger;
import java.security.*;
import java.nio.file.Path;
import java.nio.file.Paths;


public class HTMLParser {

private static final int READ_TIMEOUT_IN_MILLISSECS = (int) TimeUnit.MILLISECONDS.convert(30, TimeUnit.SECONDS);
private static HashMap <String, Integer> filecounter = new HashMap<> ();


public static List<LinkNodeLight> parse(LinkNode inputLink){
    List<LinkNodeLight> outputLinks = new LinkedList<>();
    try {
        inputLink.setIpAdress(IpFromUrl.getIp(inputLink.getUrl()));
        String url = inputLink.getUrl();
        if (inputLink.getIpAdress() != null) {
            url.replace(URLWeight.getHostName(url), inputLink.getIpAdress());
        }
        Document parsedResults =  Jsoup
                .connect(url)
                .timeout(READ_TIMEOUT_IN_MILLISSECS)
                .get();
        inputLink.setSize(parsedResults.html().length());
        /* IP address moved here in order to speed up the process */
        inputLink.setStatus(LinkNodeStatus.OK);
        inputLink.setDomain(URLWeight.getDomainName(inputLink.getUrl()));
        if (true) {
            /* save the file to the html */
            String filename = parsedResults.title();//digestBig.toString(16) + ".html";
            if (filename.length() > 24) {
                filename = filename.substring(0, 24);
            }
            filename = filename.replaceAll("[^\\w\\d\\s]", "").trim();
            filename = filename.replaceAll("\\s+",  " ");

            if (!filecounter.containsKey(filename)) {
                filecounter.put(filename, 1);
            } else {
                Integer tmp = filecounter.remove(filename);
                filecounter.put(filename, tmp + 1);
            }
            filename = filename + "-" + (filecounter.get(filename)).toString() + ".html";
            filename = Paths.get("downloads", filename).toString();
            inputLink.setFileName(filename);
            /* use md5 of url as file name */
            try (PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(filename)))) {
                out.println("<!--" + inputLink.getUrl() + "-->");
                out.print(parsedResults.html());
                out.flush();
                out.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        String tag;
        Elements tagElements;
        List<LinkNode> result;


        tag = "a[href";
        tagElements = parsedResults.select(tag);
        result = toLinkNodeObject(inputLink, tagElements, tag);
        outputLinks.addAll(result);


        tag = "area[href";
        tagElements = parsedResults.select(tag);
        result = toLinkNodeObject(inputLink, tagElements, tag);
        outputLinks.addAll(result);
    } catch (IOException e) {
        inputLink.setParseException(e);
        inputLink.setStatus(LinkNodeStatus.ERROR);
    }

    return outputLinks;
}


static List<LinkNode> toLinkNodeObject(LinkNode parentLink, Elements tagElements, String tag) {
    List<LinkNode> links = new LinkedList<>();
    for (Element element : tagElements) {

        if(isFragmentRef(element)){
            continue;
        }

        String absoluteRef = String.format("abs:%s", tag.contains("[") ? tag.substring(tag.indexOf("[") + 1, tag.length()) : "href");
        String url = element.attr(absoluteRef);

        if(url!=null && url.trim().length()>0) {
            LinkNode link = new LinkNode(url);
            link.setTag(element.tagName());
            link.setParentLink(parentLink);
            links.add(link);
        }
    }
    return links;
}

static boolean isFragmentRef(Element element){
    String href = element.attr("href");
    return href!=null && (href.trim().startsWith("#") || href.startsWith("mailto:"));
}

}

Util.java

package pkg.crawler;

import java.util.Date;

import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;


public class Util {

private static DateTimeFormatter formatter;
static {



    formatter =   DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss:SSS");


}


public static String linkToString(LinkNode inputLink){


    return String.format("%s\t%s\t%s\t%s\t%s\t%s",
            inputLink.getUrl(),
            inputLink.getWeight(),
            formatDate(inputLink.getEnqueTime()),
            formatDate(inputLink.getDequeTime()),
            differenceInMilliSeconds(inputLink.getEnqueTime(), inputLink.getDequeTime()),
            inputLink.getParentLink()==null?"":inputLink.getParentLink().getUrl()
    );
}

public static String linkToErrorString(LinkNode inputLink){

    return String.format("%s\t%s\t%s\t%s\t%s\t%s",
            inputLink.getUrl(),
            inputLink.getWeight(),
            formatDate(inputLink.getEnqueTime()),
            formatDate(inputLink.getDequeTime()),
            inputLink.getParentLink()==null?"":inputLink.getParentLink().getUrl(),
            inputLink.getParseException().getMessage()
    );
}


public static String formatDate(DateTime date){
    return formatter.print(date);
}

public static long differenceInMilliSeconds(DateTime dequeTime, DateTime enqueTime){
    return (dequeTime.getMillis()- enqueTime.getMillis());
}

public static int differenceInSeconds(Date enqueTime, Date dequeTime){
    return (int)((dequeTime.getTime()/1000) - (enqueTime.getTime()/1000));
}

public static int differenceInMinutes(Date enqueTime, Date dequeTime){
    return (int)((dequeTime.getTime()/60000) - (enqueTime.getTime()/60000));
}

}

URLWeight.java

package pkg.crawler;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.regex.Pattern;

public class URLWeight {

public static List<LinkNodeLight> weight(LinkNode sourceLink, List<LinkNodeLight> links) {

    List<LinkNodeLight> interLinks = new LinkedList<>();
    List<LinkNodeLight> intraLinks = new LinkedList<>();

    for (LinkNodeLight link : links) {
        if (isIntraLink(sourceLink, link)) {
            intraLinks.add(link);
            link.setInterLinks(false);
        } else {
            interLinks.add(link);
            link.setInterLinks(true);
        }
    }



static boolean isIntraLink(LinkNodeLight sourceLink, LinkNodeLight link){

    String parentDomainName = getHostName(sourceLink.getUrl());

    String childDomainName = getHostName(link.getUrl());
    return parentDomainName.equalsIgnoreCase(childDomainName);
}

public static String getHostName(String url) {
    if(url == null){
    //  System.out.println("Deneme");
        return "";

    }

    String domainName = new String(url);

    int index = domainName.indexOf("://");
    if (index != -1) {

        domainName = domainName.substring(index + 3);
    }
    for (int i = 0; i < domainName.length(); i++)
        if (domainName.charAt(i) == '?' || domainName.charAt(i) == '/') {
            domainName = domainName.substring(0, i);
            break;
        }

    /*if (index != -1) {

        domainName = domainName.substring(0, index);
    }*/

    /* have to keep www in order to do replacements with IP */
    //domainName = domainName.replaceFirst("^www.*?\\.", "");

    return domainName;
}
public static String getDomainName(String url) {
    String [] tmp= getHostName(url).split("\\.");
    if (tmp.length == 0)
        return "";
    return tmp[tmp.length - 1];
}


}

PingTaskManager.java

package pkg.crawler;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class PingTaskManager {

private static ExecutorService executor = Executors.newFixedThreadPool(100);

public  static void ping (LinkNode e) {
    executor.submit(new PingTaks(e));
}


}

class PingTaks implements Runnable {
 private LinkNode link;
public PingTaks( LinkNode link ) {

}

@Override
public void run() {
    /* link.ping(); */      
}


}

LinkNodeStatus.java

package pkg.crawler;

public enum LinkNodeStatus {
OK,
ERROR

}

LinkNodeLight.java

package pkg.crawler;

import org.joda.time.DateTime;

public class LinkNodeLight implements Comparable<LinkNodeLight> {
protected String url;
protected float weight;
protected DateTime enqueTime;
protected boolean interLinks;

public String getUrl() {
    return url;
}

public float getWeight() {
    return weight;
}

public void setWeight(float weight) {
    this.weight = weight;
}

public DateTime getEnqueTime() {
    return enqueTime;
}


public LinkNodeLight(String url) {
    this.url = url;
}


public void setEnqueTime(DateTime enqueTime) {
    this.enqueTime = enqueTime;
}

@Override
public int compareTo(LinkNodeLight link) {

    if (this.weight < link.weight) return 1;
     else if (this.weight > link.weight) return -1;
        return 0;

    }
}

LinkNode.java

package pkg.crawler;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.Socket;
import java.net.URL;
import java.net.UnknownHostException;
import java.util.Date;



import org.joda.time.DateTime;


public class LinkNode extends LinkNodeLight{
public LinkNode(String url) {
    super(url);
}

private String tag;
private LinkNode parentLink;
private IOException parseException = null; // initialize parse Exception with null
private float weight;
private DateTime dequeTime;
private DateTime startTime;
private DateTime endTime;
private LinkNodeStatus status;
private String ipAdress;
private int size;
private String filename;
private String domain;

public DateTime getStartTime() {
    return startTime;
}

public void setStartTime(DateTime startTime) {
    this.startTime = startTime;
}

public DateTime getEndTime() {
    return endTime;
}

public void setEndTime(DateTime endTime) {
    this.endTime = endTime;
}

public DateTime getDequeTime() {
    return dequeTime;
}

public String getTag() {
    return tag;
}

public LinkNode getParentLink() {
    return parentLink;
}

public Exception getParseException() {
    return parseException;
}

public boolean hasParseException(){
    return parseException!=null;
}


public void setDequeTime(DateTime dequeTime) {
    this.dequeTime = dequeTime;
}

public void setTag(String tag) {
    this.tag = tag;
}

public void setParentLink(LinkNode parentLink) {
    this.parentLink = parentLink;
}

public void setParseException(IOException parseException) {
    this.parseException = parseException;
}

@Override
public boolean equals(Object o) {
    if (this == o) {
        return true;
    }
    if (o == null || getClass() != o.getClass()) {
        return false;
    }

    LinkNode link = (LinkNode) o;

    if (url != null ? !url.equals(link.url) : link.url != null) {
        return false;
    }

    return true;
}

@Override
public int hashCode() {
    return url != null ? url.hashCode() : 0;
}

public long waitingInQueue(){
    return Util.differenceInMilliSeconds( dequeTime,enqueTime );
}

public long linkProcessingDuration(){
    return Util.differenceInMilliSeconds( endTime,startTime );
}

@Override
public String toString() {
    StringBuilder sb = new StringBuilder("LinkNode{");
    sb.append("url='").append(url).append('\'');
    sb.append(", score=").append(weight);
    sb.append(", enqueTime=").append(enqueTime);
    sb.append(", dequeTime=").append(dequeTime);
    sb.append(", tag=").append(tag);
    if(parentLink!=null) {
        sb.append(", parentLink=").append(parentLink.getUrl());
    }
    sb.append('}');
    return sb.toString();
}

public void setStatus(LinkNodeStatus status) {
    this.status = status;
}

public LinkNodeStatus getStatus(){
    if (status == null) {
        status = LinkNodeStatus.ERROR;
    }
    return status;
}

// check server link is it exist or not
/* this method gives fake errors
public LinkNodeStatus ping () {

    boolean reachable = false;
    String sanitizeUrl = url.replaceFirst("^https", "http");

    try {
        HttpURLConnection connection = (HttpURLConnection) new URL(sanitizeUrl).openConnection();
        connection.setConnectTimeout(1000);
        connection.setRequestMethod("HEAD");
        int responseCode = connection.getResponseCode();
        System.err.println(url + " " + responseCode);
        reachable = (200 <= responseCode && responseCode <= 399);
    } catch (IOException exception) {
    }
    return reachable?LinkNodeStatus.OK: LinkNodeStatus.ERROR;
}*/


public String getIpAdress() {
    return ipAdress;
}

public void setIpAdress(String ipAdress) {
    this.ipAdress = ipAdress;
}

/* methods for controlling url size */
public void setSize(int size) {
    this.size = size;
}

public int getSize() {
    return this.size;
}

public void setFileName(String filename) {
    this.filename = filename;
}

public String getFileName() {
    return this.filename;
}

public String getDomain() {
    return domain;
}

public void setDomain(String domain) {
    this.domain = domain;
    }
}

Best How To :

I tried to allocate memory by changing eclipse.ini setting to 2048 MB of ram as it was answered in this topic but still get the same errors after 3 hours or less.

I hate to repeat myself(*), but in eclipse.ini you set up the memory for Eclipse, which has nothing to do with the memory for your crawler.

When using command line, you need to start it via java -Xmx2G pkg.crawler.WebCrawler.

When starting from Eclipse, you need to add -Xmx2G to the run configuration ("VM arguments" rather than "Program arguments").


(*) Link to a deleted question; requires some reputation to view.

type conversion if flex

java,actionscript-3,flex

You try to cast data type mx.collections:IList to UI component type spark.components:List, which of course leads to exception. Try to follow the error message hint and use mx.collections:IList: screenList.addAll(event.result as IList); ...

Bulkheading strategies for Akka actors

java,asynchronous,akka,blocking,future

If I understand this correctly, you kind of have two options here: you listen to a Future being completed or you do something with the result: If you want to listen, you can use some callback like final ExecutionContext ec = system.dispatcher(); future.onSuccess(new OnSuccess<String>() { public void onSuccess(String result) {...

Dynamic creation of objects vs storing them as fields

java,performance,object

There won't be any difference, since you've only changed the scope of the variables. Since you're not using the variables outside of the scope, the generated bytecode will be identical as well (you can try it out with javap). So use the second style for clarity. Edit: In fact if...

Can I install 2 or more Android SDK when using Eclipse

java,android,eclipse,sdk,versions

There shouldn't be any problem if you use the latest SDK version ; actually, this is recommended. However, make sure to set the correct "Target SDK", i.e. the highest android version you have successfully tested your app with, and the "Minimum Required SDK" as well....

Javadoc: Do parameter and return need an explicit type description

java,types,javadoc

No, there's no need, the JavaDoc tool parses the Java code and gets the types from there. This article on the Oracle Java site may be useful: How to Write Doc Comments for the Javadoc Tool From the @param part of that article: The @param tag is followed by the...

How to check if an ExecutionResult is empty in Neo4j

java,neo4j

An execution result is essentially an iterator of a map, its type definition is something like: Iterable<Map<String,Object>> So you can easily just do: result.iterator().hasNext(); I think that its strictly a ResourceIterator, so if you get an iterator you are supposed to close it if you don't exhaust it. Check the...

Join files using Apache Spark / Spark SQL

java,apache-spark,apache-spark-sql

If you use plain spark you can join two RDDs. let a = RDD<Tuple2<K,T>> let b = RDD<Tuple2<K,S>> RDD<Tuple2<K,Tuple2<S,T>>> c = a.join(b) This produces an RDD of every pair for key K. There are also leftOuterJoin, rightOuterJoin, and fullOuterJoin methods on RDD. So you have to map both datasets to...

Java dice roll with unexpected random number

java,if-statement

else { System.out.println(diceNumber); } You are printing the address of diceNumber by invoking its default toString() function in your else clause. That is why you are getting the [email protected] The more critical issue is why it gets to the 'else' clause, I believe that is not your intention. Note: In...

Mysterious claim of a missing { in eclipse

java,eclipse

In Java, you cannot write executable statements directly in class.So this is syntactically wrong: for(int i=0; i<10; i++) { this.colorList[i] = this.allColors[this.r.nextInt(this.allColors.length)]; } Executable statements can only be in methods/constructors/code blocks...

count items in a column vaadin

java,vaadin

Columns don't contain items, Rows contain items. You can set the visible columns by passing a array to the setVisibleColumns methos of the Table. It could also be a idea, to just colapse the column, not hiding it... Determining if all values of this colum are empty should be simple...

Exception in thread “main” java.util.InputMismatchException: For input string: “1234567891011”

java

InputMismatchException - if the next token does not match the Integer regular expression, or is out of range. Integer.MIN_VALUE: -2147483648 Integer.MAX_VALUE: 2147483647 Instead of int use long long z = sc.nextLong(); ...

why java API prevents us to call add and remove together?

java,list,collections,listiterator

You're reading the wrong documentation: you should read ListIterator's javadoc. It says: Throws: ... IllegalStateException - if neither next nor previous have been called, or remove or add have been called after the last call to next or previous Now, if you want a reason, it's rather simple. You're playing...

How to call MySQL view in Struts2 or Hibernate

java,mysql,hibernate,java-ee,struts2

You can simply create an Entity, that's mapping the database view: @Entity public class CustInfo { private String custMobile; private String profession; private String companyName; private Double annualIncome; } Make sure you include an @Id in your view as well, if that's an updatable view. Then you can simply use...

WebDriver can't get dropdown menu element (Java)

java,selenium,webdriver,junit4

Your ID is dynamic, so you can't use it. Select will not work in your case, you just need to use two clicks WebElement dropdown = driver.findElement(By.xpath("//div[@class='select-pad-wrapper AttributePlugin']/input")); dropdown.click(); WebElement element = driver.findElement(By.xpath("//div[@class='select-pad-wrapper AttributePlugin']/div/ul/li[text()='Image']")); element.click(); ...

Interpreting hex dump of java class file

java,class,hex

The 000000b0 is not part of the data. It's the memory address where the following 16 bytes are located. The two-digit hex numbers are the actual data. Read them from left to right. Each row is in two groups of eight, purely to asist in working out memory addresses etc....

Reading and modifying the text from the text file in Java

java

I wrote a quick method for you that I think does what you want, i.e. remove all occurrences of a token in a line, where that token is embedded in the line and is identified by a leading dash. The method reads the file and writes it straight out to...

viewResolver with more folders inside of WEB-INF/jsp is not working in spring

java,spring,jsp,spring-mvc

Say you have a jsp test.jsp under /WEB-INF/jsp/reports From your controller return @RequestMapping("/helloWorld") public String helloWorld(Model model) { model.addAttribute("message", "Hello World!"); return "reports/test"; } ...

Logging operations in lightadmin

java,spring,logging,lightadmin

You can use the class AbstractRepositoryEventListener like it's show on the LightAdmin documentation here Add you logger insertion by overiding onAfterSave, onAfterCreate and onAfterDelete into your own RepositoryEventListener. After you just need to register your listener like this public class YourAdministration extends AdministrationConfiguration<YourObject> { public EntityMetadataConfigurationUnit configuration(EntityMetadataConfigurationUnitBuilder configurationBuilder) { return...

Android String if-statement

java,android,string

Correct me if I'm wrong. If you're saying that your code looks like this: new Thread(new Runnable() { public void run() { // thread code if (ready.equals("yes")) { // handler code } // more thread code }).start(); // later on... ready = "yes"; And you're asking why ready = "yes"...

error: cannot find symbol class AsyncCallWS Android

java,android,web-services

On the link you post, I see a class like below. Create this class in your project before using it. private class AsyncCallWS extends AsyncTask<String, Void, Void> { @Override protected Void doInBackground(String... params) { Log.i(TAG, "doInBackground"); getFahrenheit(celcius); return null; } @Override protected void onPostExecute(Void result) { Log.i(TAG, "onPostExecute"); tv.setText(fahren +...

C++11 Allocation Requirement on Strings

c++,string,c++11,memory,standards

Section 21.4.1.5 of the 2011 standard states: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size(). The two...

How to do custom rounding of numbers in Java?

java,rounding

Math.floor(x+0.7) should do it. This should work for an arbitrary mantissa. Just add the offset to the next integer to your value and round down. The rounding is done by floor. Here is what the java API says to floor: Returns the largest (closest to positive infinity) double value that...

setOnClickListener error Null object

java,android

After super.onCreate(savedInstanceState); insert setContentView(R.layout.YourLayout); you need to make a request to a server in another thread. It might look like public class LoginTask extends AsyncTask<Void, Void, String>{ private String username; private String password; private Context context; public LoginTask(Context context, String username, String password) { this.username = username; this.password = password;...

@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) doesn't work

java,jpa,glassfish,ejb-3.0

deleteEmployee method is not wrapped into a new transaction because you are referencing method on this. What you can do is to inject reference to the facade itself and then call deleteEmployee method on it (it should be public). More or less something like this: @Stateless public class MyFacade {...

Form submit portlet with Spring MVC

java,jsp,spring-mvc,liferay,portlet

Which version of Liferay you are using? if it is > 6.2 GA1 Then in your liferay-portlet.xml file, please add this attribute and recompile and test again. <requires-namespaced-parameters>false</requires-namespaced-parameters> Liferay adds namespace to the request parameters by default. You need to disable it. ...

BitmapFont class does not have getBound(String) method

java,android,libgdx

After the API 1.5.6 we have a different way to get the String bound. try this GlyphLayout layout = new GlyphLayout(); layout.setText(bitmapFont,"text"); float width = layout.width; float height = layout.height; and it's not recommended to create new GlyphLayout on each frame, create once and use it. ...

Using world coordinates

java,libgdx

You shouldn't use constant a pixel-to-unit conversion, as this would lead to different behavior on different screen sizes/resolutions. Also don't forget about different aspect ratios, you also need to take care about them. The way you should solve this problem is using Viewports. Some of them support virtual screen sizes,...

App Not Downloading Newest Version Of File [Java]

java,caching,download

Use URLConnection.setUseCaches(boolean);. In your case, it would be connection.setUseCaches(false);...

Getting particular view from expandable listview

java,android,listview,android-fragments,expandablelistview

You shouldn't pass your view item form a fragment to an other. You should retrieve the object associated with your group view, pass this object to your second/edition fragment. You can use setTargetFragment(...) and onActivityResult(...) to send the modified text from your second to your first fragment. And then you...

how to call Java method which returns any List from R Language? [on hold]

java,r,rjava

You can do it with rJava package. install.packages('rJava') library(rJava) .jinit() jObj=.jnew("JClass") result=.jcall(jObj,"[D","method1") Here, JClass is a Java class that should be in your ClassPath environment variable, method1 is a static method of JClass that returns double[], [D is a JNI notation for a double array. See that blog entry for...

Get current latitude and longitude android

java,android,gps,geolocation,location

See my post at http://gabesechansoftware.com/location-tracking/. The code you're using is just broken. It should never be used. The behavior you're seeing is one of the bugs- it doesn't handle the case of getLastLocation returning null, an expected failure. It was written by someone who kind of knew what he was...

Numeric literals in Java - octal? [duplicate]

java,literals,octal

-0777 is treated by the compiler as an octal number (base 8) whose decimal value is -511 (-(64*7+8*7+7)). -777 is a decimal number.

Get the value of the last inserted record

java,jdbc

You may try this query: select stop_name from behaviour where created_at in (select max(created_at) from behaviour) ...

How can implement long running process in spring hibernate?

java,spring,hibernate

I recommend you to use DeferredResult of Spring. It´s a Future implementation, that use the http long poling technique. http://docs.spring.io/spring-framework/docs/3.2.0.BUILD-SNAPSHOT/api/org/springframework/web/context/request/async/DeferredResult.html So let´s says that you will make a request, and the server it will return you the deferredResult, and then your request will keep it open until the internal process(Hibernate)...

A beginner questions about printf, java

java,string,printf

I'm sad that this question hasn't been answered, and upon that, I can't upvote it from it's -8 cause I don't have enough reputation. It seems downvoting is getting too unwarranted here. OP is just looking for an answer, which can be answered here and found online, he has tried...

Selenium catch popup on close browser

java,selenium,browser

Instead of using driver.quit() to close the browser, closing it using the Actions object may work for you. This is another way to close the browser using the keyboard shortcuts. Actions act = new Actions(driver); act.sendKeys(Keys.chord(Keys.CONTROL+"w")).perform(); Or, if there are multiple tabs opened in driver window: act.sendKeys(Keys.chord(Keys.CONTROL,Keys.SHIFT+"w")).perform(); ...

Get element starting with letter from List

java,android,list,indexof

The indexOf method doesn't accept a regex pattern. Instead you could do a method like this: public static int indexOfPattern(List<String> list, String regex) { Pattern pattern = Pattern.compile(regex); for (int i = 0; i < list.size(); i++) { String s = list.get(i); if (s != null && pattern.matcher(s).matches()) { return...

Unfortunately, (My app) has stopped. Eclipse Android [duplicate]

java,android,eclipse,adt

In your MainActivity.java at line no 34 you are trying to initialize some widget that is not present in your xml layout which you have set it in your setContentView(R.layout.... That;s why you are geting nullpointerexception. EDIT: change your setContentView(R.layout.activity_main) to setContentView(R.layout.fragment_main)...

Java Scanner not reading newLine after wrong input in datatype verification while loop

java,while-loop,java.util.scanner

You are reading too much from the scanner! In this line while (sc.nextLine() == "" || sc.nextLine().isEmpty()) you are basically reading a line from the scanner, comparing it (*) with "", then forgetting it, because you read the next line again. So if the first read line really contains the...

How to block writes to standard output in java (System.out.println())

java,logging,stdout

If you can identify the thread you want to "mute" reliably somehow (e.g. by name), you can setOut to your own stream which will only delegate the calls to the actual System.out if they don't come from the muted thread.

Android Implicit Intent for Viewing a Video File

java,android,android-intent,uri,avd

Change your onClick method to below code. You should give the option to choose the external player. @Override public void onClick(View v) { Intent intent = new Intent(Intent.ACTION_VIEW); intent.setDataAndType(Uri.parse("https://youtu.be/jxoG_Y6dvU8"), "video/*"); startActivity(Intent.createChooser(intent, "Complete action using")); } ...

Android set clickable text to go one fragment to another fragment

java,android,android-fragments,spannablestring

If LoginActivity is a fragment class then it would be okay is you use setOnClickListener on textview. But for fragment change you have to change Intent to fragmentTransaction, Use something like, textview.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { getFragmentManager().beginTransaction().replace(R.id.container, new LoginActivity() ).addToBackStack("").commit(); }); But, if you want to...

@RestController throws HTTP Status 406

java,spring,rest,maven

The issue is with the dependencies that you have in pom.xml file. In Spring 4.1.* version the pom.xml dependency for Jackson libraries should include these: <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.4.1.1</version> </dependency> You...

custom arraylist get distinct

java,android

It's not possible to do this using only the ArrayList. Either implement your own method which can be as simple as: private List<mystatistik> getAllUniqueEnemies(List<mystatistik> list){ List<mystatistik> uniqueList = new ArrayList<mystatistik>(); List<String> enemyIds = new ArrayList<String>(); for (mystatistik entry : list){ if (!enemyIds.contains(entry.getEnemyId())){ enemyIds.add(entry.getEnemyId()); uniqueList.add(entry); } } return uniqueList; } Or...

PropertyNotFoundException in jsp

java,jsp

The name of your getter & setter is wrong. By convention it must be: public Integer getSurvey_id() { return survey_id; } public void setSurvey_id(Integer survey_id) { this.survey_id=survey_id; } ...

Get document on some condition in elastic search java API

java,elasticsearch,elasticsearch-plugin

When indexing documents in this form, Elasticsearch will not be able to parse those strings as dates correctly. In case you transformed those strings to correctly formatted timestamps, the only way you could perform the query you propose is to index those documents in this format { "start": "2010-09", "end":...

Get the min and max value of several items with Comparable

java

You should not let BehaviourItem implement Comparable as it doesn’t have a natural order. Instead, implement different Comparators for the different properties. Note that in Java 8, you can implement such a Comparator simply as Comparator<BehaviourItem> orderBySpeed=Comparator.comparingInt(BehaviourItem::getSpeed); which is the equivalent of Comparator<BehaviourItem> orderBySpeed=new Comparator<BehaviourItem>() { public int compare(BehaviourItem a, BehaviourItem...

SOAP Client, Following an example

java,soap,saaj

Actually you can generate class with soap ui. And your program can easily call the service using the class created without construct your own request header and body But you need some library. Example java jdk comes with jax-ws lib tutorial: http://www.soapui.org/soap-and-wsdl/soap-code-generation.html...

Finding embeded xpaths in a String

java,regex

Use {} instead of () because {} are not used in XPath expressions and therefore you will not have confusions.

Get network interfaces on remote machine

java,network-programming

No, we cannot by definition. The IP address is needed to hide the mac address from external world. To retrieve it you definitely need some code running on that machine. It means that you need some kind of agent. You can either implement it in Java or use platform specific...