In the glass dataset, all values (except for "RI") are percentages, which for each row sum up to ~100%. So they are by definition NOT independent. For instance, if a glass contains 50% silicon (Si) and 30% aluminum, these two components alone comprise 80% of the theoretical 100%. So for...
r,classification,weka,scatter-plot
You can use Multidimensional Scaling (MDS) to first, reduce the dimension of your data and then plot it. This method tries to preserve the distances between points when projecting into a lower dimension. Here is an example in R for the iris dataset data <- iris colors <- as.integer(as.factor(data$Species)) d...
I think your library is missing(weka.core.Attribute). Right click on your project --- > Deployment Assembly --> Add --> Archive from file system (if your using external jar file). Once you added clean your project and rerun it.
Sure, it already exists. Please check class Vote.
Sheepishly, I continued my Googling and found the solution here: http://forums.pentaho.com/showthread.php?152334-WEKA-RotationForest-by-comman-line-is-not-working! Basically, I needed to start my syntax with: java -cp wekafiles/packages/alternatingDecisionTrees/alternatingDecisionTrees.jar:wekafiles/packages/rotationForest/rotationForest.jar:weka/weka.jar or java -cp [path-to-package_1] : [path-to-package_2] : [path-to-weka.jar] Then, I can envoke weka.classifiers.meta.rotationForest and weka.classifiers.trees.ADTree and go forward:...
The output of the training/testing in Weka depends on the type of the attribute that you are trying to predict. If your attribute is nominal, you will get a confusion matrix and accuracy value. If your attribute is numeric, you will get a correlation coefficient. In your small and large...
Below is a method that will help you to solve your problem. So, You can edit it to reach your aim. public void showPredictions( ){ BufferedReader reader=null; reader= new BufferedReader(new FileReader("SparseDTM.arff")); Instances data = new Instances(reader); double[] predictions; try { NaiveBayes classifier = new NaiveBayes(); classifier.buildClassifier(data); predictions = eval.evaluateModel(classifier, data...
I am attempting to answer the question using a different text classification task than spam classification. Say, I have the following training data: "The US government had imposed extra taxes on crude oil", petrolium "The German manufacturers are observing different genes of Canola oil", non-petrolium And the following test data:...
java,runtime-error,weka,multinomial
Based on your data, it appears that your last attribute is a nominal data type (Contains mostly numbers, but there are some strings as well). LinearRegression will not allow the prediction of nominal classes. What you could potentially do to ensure that your given dataset works is run it through...
I did some stupid things, i just needed the multilayerperceptron. I solved that with: java -classpath weka.jar weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.5 -N 500 -V 0 -S 0 -E 20 -H a Just calling the multilayer....
java,model,classification,weka,training-data
java weka.classifiers.trees.J48 -no-cv -t /some/where/train.arff -d /other/place/j48.model How I got there: java weka.classifiers.trees.J48 --help lists the available options, among others: -no-cv Do not perform any cross validation. So when I use your command and add the -no-cv flag, that seems to do what you want....
I found the videos below quite helpful when I first got my hands on text classification using Weka. You might want to take a look. Weka Tutorial 31: Document Classification 1 (Application) Weka Tutorial 32: Document classification 2 (Application) WEKA Text Classification for First Time & Beginner Users You might...
The error java.lang.IllegalArgumentException: Attribute neither nominal nor string! at weka.core.Instance.setValue(Instance.java:687) does not refer to stringValue, but to the class attribute of instance. When you do stumpyInsts.setClassIndex(stumpyInsts.numAttributes() - 1); you tell stumpyInsts what index its class is, but not that is should be a nominal or String attribute. According to this...
According to the ARFF Format Documentation, REAL is not a valid attribute type. Try NUMERIC. Also be careful with quotes. The parser may assume that " is used to quote strings, and your quotes do not match....
The -K parameter does not appear to work when a classifier is being loaded, but rather when training the model from the command prompt. This is likely due to the fact that the kNN model is already trained with, say, k=1, so changing k would change the model that has...
machine-learning,weka,data-mining,rapidminer,text-classification
The simplest way is to break the dataset into binary problems. If for example you have the datasets text1: science text2: sports, politics Break the dataset into 3 datasets: dataset1 (science): text1:true, text2:false dataset2 (sports): text2:false, text2:true dataset3 (science): text1:false, text2:true Create 3 binary classifiers, one for each class, use...
http://www.cs.tufts.edu/~ablumer/weka/doc/weka.classifiers.DistributionClassifier.html import weka.classifiers.DistributionClassifier; ...
You can extract the majority of the values you are interested in directly from the evaluation object. I am unsure of "Coverage of cases", and "mean rel. region". The rest can be done as follows: Instances train = // load train instances ... Instances test = // load test instances...
The Trainable Weka Segmentation plugin doesn't adhere to the macro recording conventions of ImageJ, mainly because of its complex structure. However, the correct way to interact with the plugin by macro scripting is described extensively in its documentation on the Fiji wiki. In summary, you need to something like: open("C:\\input\\test.tif");...
Ok, the problem here is that I use SparseInstance here, which is cutting values which equal 0. I intuitively thought that it concerns only numeric attributes - and only their values are being erased - no affect to nominal ones. But I've missed this fragment in documentation: this also includes...
If it really is a string (and not a nominal value), you can use StringToWordVector Converts String attributes into a set of attributes representing word occurrence (depending on the tokenizer) information from the text contained in the strings. The set of words (attributes) is determined by the first batch filtered...
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your...
Any classifier in WEKA can be tested using a Evaluation object like so: Evaluation eTest = new Evaluation(testInstances); eTest.evaluateModel(yourUpdatableModelHere, testInstances); //Print the results System.out.println(eTest.toSummaryString()); //Get the confusion matrix double[][] confMatrix = eTest.confusionMatrix(); For more, see the JavaDoc on Evaluation here....
java,machine-learning,weka,liblinear
I found that to use liblinear in weka gui, you have ti install weka-3-7 version. Then click on tools--> package manager--> install liblinear.
Change your code so that you can load your classifier differently. protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html"); PrintWriter out = response.getWriter(); // Get formula input ... // Read Classifier and predict Success try { Classifier J48 = getClassifier(); if (klassifizierer != null) {out.println("KLASSIFIZIERER IS LOADED");}...
This link (in part) describes creating an Instance object. Instead of writing your data to a .arff file and then reading it, you can skip the middle man. You may find this question (Define input data for clustering using WEKA API) helpful for a concrete example of creating an Instances...
The problem lies with the evaluation on test set with class attributes set to ? or empty. You will get some results on training sets because for training data, you know all the data labels. But for test set where your labels are unknown, how do you know that the...
Normally you use independent test set for testing your trained classifiers. If you use this option, you test your trained classifier on your train set, giving over-fit results, which are grossly optimistic. Read following question for more information....
time-series,weka,timeserieschart
I figured out the answer. Create a TSEvaluation object Call evaluateForecaster(TSForecaster forecaster, boolean buildModel, java.io.PrintStream... progress) Call graphFutureForecastOnTesting(GraphDriver driver, TSForecaster forecaster, java.util.List targetNames) which returns a JPanel For example- // Your instances Instances train; // Your WekaForecaster WekaForecaster forecaster; ... // Custom hold out set size int holdOutSetSize = 1;...
There is a way to do it in weka. You should look into clustering: https://www.youtube.com/watch?v=zjYUYJ2b4r8 I would also suggest trying to get more features (more columns) for better results....
Which version of weka do you use? I think you can't instantiate Instance, because it is an interface. Moreover, when you create new Attribute with: new Attribute("attr1") it gets -1 as an attribute index. If you have your training Instances from arff file, use attribute from them when you create...
java,exception,weka,data-mining,k-means
According to the Attribute-Relation File Format (ARFF): Lines that begin with a % are comments. (http://www.cs.waikato.ac.nz/ml/weka/arff.html) So given your moviedata.arff @data section, that could explain why there are no training instances being read in. In other words, when the exception says "Not enough training instances (required: 1, provided: 0)", it...
From the source: if (value < -45) { value = 0; } else if (value > 45) { value = 1; } else { value = 1 / (1 + Math.exp(-value)); } return value; Pretty simple sigmoid with a clamp using the logistic function. That said, you're very likely not...
It seems to me that you want to do two things: Convert JSON Data to .arff Write composite attributes to the arff file I don't know if arff files support #2. Here is some code to transform your JSON into arff (#1 ) In R: library(RWeka) library(rjson) json = rjson::fromJSON('[{...
python,machine-learning,scikit-learn,weka,arff
I really recommend liac-arff. It doesn't load directly to numpy, but the conversion is simple: import arff, numpy as np dataset = arff.load(open('mydataset.arff', 'rb')) data = np.array(dataset['data']) ...
From the Classify panel choose metalearner "FilteredClassifier", and make its filter parameter "weka.filter.supervised.instance.ClassBalancer". Then select which classifier to use, i.e. J48, AdaBoost, etc.
The issue is with python-weka-wrapper. The bug has not been fixed in the stable release, but it has been fixed in the current build, so you can directly build from source. This issue was not present in older stable versions....
java,csv,nullpointerexception,weka,arff
test with this function to see if the problem is the file.. http://docs.oracle.com/javase/7/docs/api/java/io/File.html#exists() http://docs.oracle.com/javase/7/docs/api/java/io/File.html#canRead() It seems you have a file problem. Try checking the paths.....
java,eclipse,serialization,weka
The 'eclipse.ini' file sets the memory for Eclipse itself not your program. To set the memory for your program open the 'Run > Run Configurations' dialog. Find your program in the Java Application section and specify the -Xmx option you want in the 'Arguments > VM arguments' section....
Basically you should try to use the option "-M" for SMO to fit logistic models, in training process. Check the solution proposed here. It should work!
The reason you are getting 100388D as 100388.0 and 100390F as 100390.0 is because the values are ending with D and F respectively. In Java, this means the values are Double and Float (D stands for Double and F stands for Float). That is why when Weka is converting them...
In its datasets, Weka considers a newline character as an indication of the end of instance. Your line 17 is actually a multi-line tweet which confuses Weka. You can use either a RegEx to get rid of the newline characters in every single tweet or during downloading the tweets, clean...
@ECHO OFF setlocal SET "CLASSPATH="C:\Program Files (x86)\Weka-3-6\weka.jar"" FOR /r %%I IN (*.arff) DO ( ECHO Running %%~nI java weka.classifiers.functions.LinearRegression -t %%~nI -x 10 ) endlcoal set class path can be expanded after the for loop is finished.So better defined it outside the loop....
java,machine-learning,weka,arff
The answer provided here will help address some of your concerns: Does test file in WEKA require or less number of features as train. In short, you first need to make sure you have the same attributes for your training and testing instances (you should be able to insert '?'...
Missing value issue Use the ReplaceMissingValues filter in Weka. Detail about the class can be found here Missing class issue Those are your test instances. You need to build classifiers and then apply on these instances with '?' tags to provide them a class label. ...
java,weka,text-classification,arff
You can use Weka's StringToWordVector filter to convert the text into a word vector (but not necessarily a sparse matrix). Take a look at my tutorial on this.
I end up finding the solution of the problem. The constructor accept a signature which use the deprecated class FastVector. I just added a snapshot of my code in case it might help someone. attInfo = FastVector(); attInfo.addElement(weka.core.Attribute('att1')); attInfo.addElement(weka.core.Attribute('att2')); attInfo.addElement(weka.core.Attribute('att3')); % build the class attribute classValues = FastVector(); classValues.addElement(java.lang.String('0')); classValues.addElement(java.lang.String('1'));...
First of all that JDBC warning messages are nothing to worry about, read here. Following code read csv files and output its contents to console, see in github. package wekaExamples.loadDatasetExamples; import weka.core.Instances; import weka.core.converters.*; import java.io.*; /** * Created by atilla.ozgur on 17.12.2014. */ public class LoadCsvExample { public static...
I found a solution to my problem, and I hope it'll be useful. //Declaring attributes Attribute PT1 = new Attribute("PT1"); Attribute w1 = new Attribute("w1"); Attribute d1 = new Attribute("d1"); Attribute PT2 = new Attribute("PT2"); Attribute w2 = new Attribute("w2"); Attribute d2 = new Attribute("d2"); // Declare the class attribute...
machine-learning,cluster-analysis,weka
You should drop the class attribute before you do clustering. It has too much predictive power, and as a consequence of this, the clustering algorithm has a strong bias to prefer the class attribute internally. You can do this attribute removal in the "Preprocess" panel by clicking the "remove" button,...
Between switching the files, adding them to the build path etc, I had accidentally added a file without modifying the database parameter in the DatabaseUtils.prop file.
Try getProbability method on BayesNet class. Here is what I do. for(int i = 0; i < bnet.getCardinality(nodeIndex); i++) { System.out.print(bnet.getNodeValue(nodeIndex, i) + " = " + bnet.getProbability(nodeIndex, row, i) + " "); } Where row is 0 <= row < bnet.getParentCardinality() and each value of row corresponds to a...
I have had similar issue, I solved it like this, you can change it to fit your requirement: public class SerialDataReader { static SerialPort serialPort; private static final int DATA_RATE = 9600; private static final String PORT_NAME = "COM4"; public static void main(String[] args) { SerialDataReader serialDataReader = new SerialDataReader();...
machine-learning,weka,random-forest,decision-tree
When tree is split on numerical attribute, it is split on the condition like a>5. So, this condition effectively becomes binary variable and the criterion (information gain) is absolutely the same. P.S. For regression commonly used is the sum of squared errors (for each leaf, then sum over leaves). But...
java,classification,weka,linear-regression
Linear Regression should accept both nominal and numeric data types. It is simply that the target class cannot be a nominal data type. The Model's toString() method should be able to spit out the model (other classifier options may also be required depending on your needs), but if you...
java,nullpointerexception,weka,nearest-neighbor
Well, using constructor without parameter and setting the param in next step solved the issue here. I mean I changed KDTree knn = new KDTree(); to KDTree knn = new KDTree(); knn.setInstances(ds); and it works. I don't know what to tell, just congrats weka!...
I have found the solution to my own question and thus I am providing the information here so that it might help someone else. My original problem was that I was getting an "UnsignedDataSetException". To solve that I added a method call to setDataSet like so: ....previous code omitted, can...
The RWeka package itself cannot do that . However, RWeka uses the partykit package for displaying its trees which can do what you want. Look at the vignette(“partykit“, package = “partykit“) how you can construct a recursive partynode object with pre-specified partysplits and then turn them into a constparty. The...
If you explicitly use the -classpath flag, the %CLASSPATH% variable is not used. You can either add libsvm to the -classpath (it's semicolon separated on windows) or add weka to the CLASSPATH variable.
My understanding is that you would like to store the predicted labels of your model into your missing labels. What you could do is right-click on the Model after training, then select 'Visualize Classifier Errors'. In this visualization screen, set Y as the predicted class and then save the new...
found it by testing all its methods.. ibk.buildClassifier(dataSet); rez2 = ibk.distributionForInstance(i2); //distrib int result = (int)rez3[0]; //it goes tha same with Kstar Came to realize that classifiers in weka normaly run with discrete data (equal steps from min to max). And my data is not all discrete. Ibk and Kstar...
The UnsatisfiedLinkError is thrown when an application attempts to load a native library like .so in Linux, .dll on Windows or .dylib in Mac and that library does not exist. Specifically, in order to find the required native library, the JVM looks in both the PATH environment variable and the...
I created a new attribute as a string called ID. Then i called the weka with: java -classpath weka.jar weka.classifiers.meta.FilteredClassifier -F weka.filters.unsupervised.attribute.RemoveType -t file -W "weka.classifiers.functions.MultilayerPerceptron" ...
machine-learning,weka,decision-tree
It is a problem with the book (Keeping the answer over here so that it can help other readers of the book). Book expects only one negative case in the end_rack category (Look for (5,1) in author's tree diagram). In data provided in the book and even on the book...
machine-learning,scikit-learn,classification,weka,libsvm
You can look at RandomForest which is a well known classifier and quite efficient. In scikit-learn, you have some class that can be used over several core like RandomForestClassifier. It has a constructor parameter that can be used to define the number of core or a value that will use...
When selected (in the scenario that you are able to select "InfoGainAttributeEval"), if you double click the white space containing the text "InfoGainAttributeEval", then select capabilities, the following is presented: CAPABILITIES Class -- Missing class values, Nominal class, Binary class Attributes -- Empty nominal attributes, Nominal attributes, Numeric attributes, Unary...
public class Run { public static void main(String[] args) throws Exception { ConverterUtils.DataSource source1 = new ConverterUtils.DataSource("./data/train.arff"); Instances train = source1.getDataSet(); // setting class attribute if the data format does not provide this information // For example, the XRFF format saves the class attribute information as well if (train.classIndex() ==...
bash,csv,weka,file-conversion,arff
Clone this github repository. It contains an arff2csv tool in the "tools" subdirectory. arff2csv is designed to run in pipes of unix commandline tools. https://github.com/jeroenjanssens/data-science-at-the-command-line arff2csv is a one-line shell-script that calls another shell script that calls weka.jar, so it needs java installed on your machine; and note that arff2csv...
What I have found until now CostSensitiveClassifier has two operation modes: it may either set explicit weights on samples (by using the .weight() method) or it may resample with substitution. In my particular case, it was using the last approach. Therefore, the class arrangement described above would resample twice the...
In short, is J48 either a linear or a non linear classifier? I don't know. However, the decision boundaries of J48 can be made, in a way, "stepwise linear". So you can approximate a nonlinear decision boundary if you set minNumObjects low enough and set pruning to false (=...
Trivial Training-validation-test Create two datasets from your labelled instances. One will be training set and the other will be validation set. The training set will contain about 60% of the labelled data and the validation will contain 40% of the labelled data. There is no hard and fast rule for...
It seems you change the code from the blog's GitHub repository in one detail and it is the cause of your error: c.evaluate(); c.learn(); vs c.evaluate(); c.learn(); The evaluate() method resets the classifier with the line: classifier = new FilteredClassifier(); but doesn't build a model. The actual evaluation uses a...
cluster-analysis,weka,data-mining,k-means,rapidminer
Weka often uses built-in normalization at least in k-means and other algorithms. Make sure you have disabled this if you want to make results comparable. Also understand that k-means is a randomized algorithm. Different results even from the same package are to be expected (and desirable)....
With more deeply search, I have found the answer. The problem was originated by changed function which create a feature. Since this function was changed, result of feature in the feature set was not equals to arff file. All results are logical now.
generate_arff('my_program.freq_queries.out','my_program.arff'). This does not work in windows for some reason, but works in Linux...
machine-learning,weka,svm,libsvm
Yes, the default kernel is RBF with gamma equal to 1/k. See other defaults in javadocs here or here. NB: Weka contains its own implementation - SMO, but it also provides wrapper for libsvm, and "LibSVM runs faster than SMO" (note that it requires installed libsvm, see docs)....
matlab,weka,activity-recognition
You can either import csv files or arff files. The default format to import data in Weka being the arff format. The details for the arff format is described here. You can basically just add the following header: @RELATION movement @ATTRIBUTE x NUMERIC @ATTRIBUTE y NUMERIC @ATTRIBUTE z NUMERIC @ATTRIBUTE...
I don't believe that K-Means Clustering requires a class attribute. If you have set one for your instances, please try to remove it and rerun the code. This guide may assist in methods for building a clustering model. Hope this helps!...
java,weka,text-classification,categorization
It seems like you changed the code from the website you referenced in some crucial points, but not in a good way. I'll try to draft what you're trying to do and what mistakes I've found. What you (probably) wanted to do in extractFeature is Split each tweet into words...
Case 1: Both your training and test set have class labels Training: @relation simple-training @attribute feature1 numeric feature2 numeric class string{a,b} @data 1, 2, b 2, 4, a ....... Testing: @relation simple-testing @attribute feature1 numeric feature2 numeric class string{a,b} @data 7, 12, a 8, 14, a ....... In this case,...
You can use the cluster_instance(Instance) method to obtain the 0-based index of the cluster or the distribution_for_instance(Instance) method to obtain the cluster distribution: for inst in data: cl = clusterer.cluster_instance(inst) dist = clusterer.distribution_for_instance(inst) print("cluster=" + str(cl) + ", distribution=" + str(dist)) ...
In data mining, there is a multi-way trade-off between the number of features that you use, your accuracy, and the time it takes to generate a model. In theory, you'd want include every possible feature to boost accuracy; however, going about data mining in this way guarantees lengthy model generation...
You can try to convert it to binary features, for each such nominal attribute, e.g. has_A, has_B, has_C. Then if you scale it i1 and i3 will be closer as the mean for that attribute will be above 0.5 (re to your example) - i2 will stand out more. If...
java,android,android-studio,weka
It was problem with weka.jar library. Normal one that I download from weka site is not working with android.Now I downloaded modified one from user rjmanrsan: https://github.com/rjmarsan/Weka-for-Android Working now :)
The function as.logical will coerce any non-zero number to TRUE and zero to FALSE. Assuming your CSV has been loaded into a data frame called boxdata with the same column names as in the CSV: ifelse(as.logical(boxdata$Receiver_TotalVideoDecoderErrors), 'Error', 'No error') Or you can use broadcasting to do the work for you....
The error: Exception in thread "main" weka.core.WekaException: weka.clusterers.SimpleKMeans: Cannot handle any class attribute! states that SimpleKMeans cannot handle a class attribute. This is because K-means is an unsupervised learning algorithm, meaning that there should be no class defined. Yet, one line in the code sets the class value. If you...
data,weka,data-mining,equation,linear-regression
If the vendor is equal to any of the line's nominal values, then the value is a one, otherwise, the value is a zero. For example, in line 1: -152.7641 * vendor=microdata,prime,formation,harris,dec,wang,perkin-elmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl The value would be subtracted by 152.7641 if and only if the vendor is equal to one of...
Once you have loaded your ARFF, you could apply a StringToWordVector to build your word list. From there, you could use a classifier (such as Naive Bayes) to predict your classes (you may need to filter the other attributes to ensure they are not used as inputs for the classifier...
You have total 8 classes: a, b, c, d, e, f, g, h. You will thus get 8 different TP, FP, FN, and TN numbers. For instance, in the case of a class, TP (instance belongs to a, classified as a) = 1086 FP (instance belongs to others, classified as...
You're giving the InputMappedClassifier the wrong options. It's complaining that you are giving it the training (-t) and test (-T) data. It supports the following: Options specific to weka.classifiers.misc.InputMappedClassifier: -I Ignore case when matching attribute names and nominal values. -M Suppress the output of the mapping report. -trim Trim white...