java,optimization,machine-learning,scipy,stanford-nlp

What you have should be just fine. (Have you actually had any problems with it?) Setting termination both on max iterations and max function evaluations is probably overkill, so you might omit the last argument to qn.minimize(), but it seems from the documentation that scipy does use both with a...

machine-learning,artificial-intelligence,neural-network,classification,backpropagation

The "units" are just floating point values. All computations happening there are vector multiplications, and thus can be parallelized well using matrix multiplications and GPU hardware. The general computation looks like this: double v phi(double[] x, double[] w, double theta) { double sum = theta; for(int i = 0; i...

python,numpy,encoding,machine-learning,scikit-learn

If I understand your question correctly, then it is simply quadratic programming - all the constraints you mentioned (both equalities and inequalities) are linear.

python,r,machine-learning,scikit-learn,regression

The best_score_ is the best score from the cross-validation. That is, the model is fit on part of the training data, and the score is computed by predicting the rest of the training data. This is because you passed X_train and y_train to fit; the fit process thus does not...

optimization,machine-learning,dataset

I don't know if this format really provides better representation, but I can speculate why it can be more efficient. First, as they state at format description, "Having data of the same precision consecutive enables hardware vectorization."; consider also wikipedia: "Vector processing techniques have since been added to almost all...

On systems supporting /dev/stdout (and /dev/stderr), you may try this: vw -t -i model.vw --daemon --port 26542 --link=logistic -r /dev/stdout The daemon will write raw predictions into standard output which in this case end up in the same place as localhost port 26542. The relative order of lines is guaranteed...

machine-learning,scikit-learn,classification,weka,libsvm

You can look at RandomForest which is a well known classifier and quite efficient. In scikit-learn, you have some class that can be used over several core like RandomForestClassifier. It has a constructor parameter that can be used to define the number of core or a value that will use...

matlab,machine-learning,feature-extraction,p-value

You need to perform an ANOVA (Analysis of Variance) test for each of the voxels. From the above linked Wikipedia page: In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two...

r,machine-learning,classification,regression,caret

Look at the help page ?models. Also, here are some links too. Also: > is_class <- unlist(lapply(mods, function(x) any(x$type == "Classification"))) > class_mods <- names(is_class)[is_class] > head(class_mods) [1] "ada" "AdaBag" "AdaBoost.M1" "amdai" "avNNet" [6] "bag" ...

machine-learning,cross-validation

T-test is a type of statistical test on your data. Say you are comparing two datasets and you want to know if the two data sets are significantly different from each other. Then you will do a t-test. Cross validation is more of a technique for evaluation your models. Typically...

python,multithreading,machine-learning,theano

The most direct answer to your question is: you can't parallelize training in the way you desire. BLAS, OpenMP, and/or running on a GPU only allow certain operations to be parallelized. The training itself can only be parallelized, in the way you want, if the training algorithm is designed to...

machine-learning,neural-network,backpropagation,feed-forward

In short, yes it is a good approach to use a single network with multiple outputs. The first hidden layer describes decision boundaries (hyperplanes) in your feature space and multiple digits can benefit from some of the same hyperplanes. While you could create one ANN for each digit, that kind...

machine-learning,neural-network,point-clouds

An RBF network essentially involves fitting data with a linear combination of functions that obey a set of core properties -- chief among these is radial symmetry. The parameters of each of these functions is learned by incremental adjustment based on errors generated through repeated presentation of inputs. If I...

image-processing,machine-learning,computer-vision

Test, Train and Validate Read this stats SE question: What is the difference between test set and validation set? This is basic machine learning, so you should probably go back and review your course literature, since it seems like you're missing some pretty important machine learning concepts. Do we...

c++,machine-learning,text-processing,text-extraction,lda

Umm... surely it should be easy enough to code? The stupidest, yet guaranteed to work, approach will be to iterate over all the documents twice. During the first iteration, create a hashmap of the words and a unique index (a structure like HashMap), and during the second iteration, you do...

matlab,machine-learning,computer-vision,deep-learning,conv-neural-network

These two lines net = cnnff(net, x); [~, h] = max(net.o); feed an image x through the network and then compute the index h which had the largest output activation. You can simply do the same for an arbitrary input image x and it will give you the class h....

machine-learning,statistics,linear-regression

So Linear Regression assumes your data is linear even in multiple dimensions. It wont be possible to visualize high dimensional data unless you use some methods to reduce the high dimensional data. PCA can do that but bringing it down to 2 dimensions won't be helpful. You should do Cross...

python,pandas,machine-learning,scikit-learn

It looks like I have answered my own question. the estimators_samples_ attribute of DecisionTreeClassifier is what I want.

machine-learning,nlp,scikit-learn,svm,confusion-matrix

Classification report must be straightforward - a report of P/R/F-Measure for each element in your test data. In Multiclass problems, it is not a good idea to read Precision/Recall and F-Measure over the whole data any imbalance would make you feel you've reached better results. That's where such reports help....

machine-learning,pattern-recognition,bayesian-networks

Is this the right approach? There's many possible approaches, but here's a very simple and effective one that fits the domain: Given the nature of the application, chronological order doesn't really matter, it doesn't matter if the Fan gets turned on before the Light e.g. Also given that you...

python,machine-learning,scikit-learn,classification,text-classification

text_data = load_files("C:/Users/USERNAME/projects/machine_learning/my_project/train", ...) According to the documentation, that line loads your file's contents from C:/Users/USERNAME/projects/machine_learning/my_project/train into text_data.data. It will also load target labels (represented by their integer indexes) for each document into text_data.target. So text_data.data should be a list of strings and text_data.target a list of integers. The labels...

One thing you can do is calculate correlation between rows. Take a look at the tutorial about summary statistics at mllib website. More advanced approach would be use dimensionality reduction. This should discover more complex dependencies....

machine-learning,cluster-analysis,k-means,hierarchical-clustering

The idea I'm suggesting is originated in text-processing, NLP and information retrieval and very widely used in situations where you have sequences of characters/information like Genetic information. Because you have to preserve the sequence, we can use the concepts of n-grams. I'm using bi-grams in the following example, though you...

machine-learning,nlp,artificial-intelligence

Your tests sound very reasonable — they are the usual evaluation tasks that are used in research papers to test the quality of word embeddings. In addition, the website www.wordvectors.org can give you a good idea of how your vectors measure up. It allows you to upload your embeddings, generates...

r,machine-learning,cluster-analysis,som,unsupervised-learning

Map 1 is the average vector result for each node. The top 2 nodes that you highlighted are very similar. Map 2 is a kind of similarity index between the nodes. If you want to obtain such kind of map using the map 1 result you may have to develop...

machine-learning,statistics,classification,multilabel-classification

There are several available metrics, described in the following paper: Sokolova, Marina, and Guy Lapalme. "A systematic analysis of performance measures for classification tasks." Information Processing & Management 45.4 (2009): 427-437. See Table 3 on page 4 (430) - it contains brief description and formula for 8 metrics; choose the...

machine-learning,probability,mle,language-model

The likelihood function describes the probability of generating a set of training data given some parameters and can be used to find those parameters which generate the training data with maximum probability. You can create the likelihood function for a subset of the training data, but that wouldn't be represent...

python,machine-learning,scikit-learn

Try this: goodwords = ((countmatrix > 1).mean(axis=0) <= 0.8).nonzero()[0] It first computes a Boolean matrix which is True if countmatrix > 1 and computes the column-wise mean of it. If the mean is less than 0.8 (80%), the corresponding column index is returned by nonzero(). So, goodwords will contain all...

machine-learning,neural-network

If you are talking about session-based course (which I have passed previously): https://www.coursera.org/learn/machine-learning than it uses a batch-learning approach, in exercise 4 (which covers ANN). If you carefully study the cost function you will see that it is calculated using all of available examples, not just one randomly chosen....

r,machine-learning,neural-network

Try using this to predict instead: res = compute(r, m2[,c("Pclass", "Sexmale", "Age", "SibSp")]) That worked for me and you should get some output. What appears to have happend: model.matrix creates additional columns ((Intercept)) which isn't part of the data which was used to build the neural net, as such in...

Sure, it already exists. Please check class Vote.

machine-learning,gradient-descent

Mathematically, we are trying here to minimise error function Error(θ) = Σ(yi - h(xi))^2 summation over i. To minimise error, we do d(Error(θ))/dθi = Zero putting h(xi) = Σ(θi*xi) summation over i and derive the above formula. The rest of the formulation can be reasoned as Gradient descent uses the...

python,machine-learning,neural-network

OK, so, first, here's the amended code to make yours work. #! /usr/bin/python import numpy as np def sigmoid(x): return 1.0 / (1.0 + np.exp(-x)) vec_sigmoid = np.vectorize(sigmoid) # Binesh - just cleaning it up, so you can easily change the number of hiddens. # Also, initializing with a heuristic...

machine-learning,artificial-intelligence,neural-network

This is a good application of convolutional neural networks. There are a number of libraries and services available for doing this. Caffe is a tool for doing this, though I don't have any experience with it. Do some googling for other tools, search for "convolutional neural networks". For services there's...

matlab,image-processing,machine-learning,octave,k-means

edge in MATLAB / Octave returns a binary / logical matrix. kmeans requires that the input be a double or single matrix. Therefore, simply cast ed to double and continue: ed=edge(de,"canny"); imshow(ed); ed = double(ed); %// Change j=kmeans(ed,3); ...

machine-learning,vowpalwabbit,precision-recall

Given that you have a pair of 'predicted vs actual' value for each example, you can use Rich Caruana's KDD perf utility to compute these (and many other) metrics. In the case of multi-class, you should simply consider every correctly classified case a success and every class-mismatch a failure to...

azure,machine-learning,batch-processing,azure-scheduler,azure-machine-learning

You would use Azure Data Factory instead of the scheduler. This would allow you to schedule the BES call into the future while identifying where the result file will end up. There are lots of examples online on how to do that....

java,machine-learning,neural-network

AI is being set to the output value from the leftNeuron of the previous connection (whatever node that is connecting to the current one). The way the back propagation algorithm works is by going through every layer in the ANN, and every node in it, then summing up all of...

machine-learning,normalization

Further, you plan to use both feature scaling (dividing by the "max-min", or range, of a feature) and mean normalization. So for any individual feature f: f_norm = (f - f_mean) / (f_max - f_min) e.g. for x2,(midterm exam)^2 = {7921, 5184, 8836, 4761} > x2 <- c(7921, 5184,...

machine-learning,recommendation-engine,user-profile,cosine-similarity

Similarity measures between object in clustering analysis is a broad subject. What I would suggest for You is to consider approach of 'divide and conquer'. Treat similarity between two user profiles as weighted average from all attributes similarity. Just remember to user normalized values for Your attributes similarity before doing...

machine-learning,svm,feature-extraction,feature-selection

You can transform the different time series to live in the same coordinate system by solving the orthogonal Procrustes problem. Here are the five arrays of Euler angles that you gave me (they are stored in arr[0] through arr[4] as 169x3 numpy arrays): Now we solve the orthogonal Procrustes problem...

machine-learning,neural-network,deep-learning,caffe,matcaffe

You should look for the file 'synset_words.txt' it has 1000 line each line provides a description of a different class. For more information on how to get this file (and some others you might need) you can read this. If you want all the labels to be ready-for-use in Matlab,...

python,c,machine-learning,word2vec

There are a number of opportunities to create Word2Vec models at scale. As you pointed out, candidate solutions are distributed (and/or multi-threaded) or GPU. This is not an exhaustive list but hopefully you get some ideas as to how to proceed. Distributed / Multi-threading options: Gensim uses Cython where it...

library(nnet) traininginput <- as.data.frame(runif(50,min=1,max=100)) trainingoutput <- traininginput/2 trainingdata<-cbind(traininginput,trainingoutput) colnames(trainingdata)<-c("Input","Output") net.sqrt2 <- nnet(Output~Input, data=trainingdata, size=0,skip=T, linout=T) Testdata<-data.frame(Input=1:50) net.result2<-predict(net.sqrt2, newdata = Testdata, type="raw") cleanoutput2 <- cbind(Testdata,Testdata/2,as.data.frame(net.result2)) colnames(cleanoutput2)<-c("Input2","Expected...

c++,opencv,machine-learning,neural-network,weight

I've only done a little bit of poking around so far, but what I've seen confirms my first suspicion... It looks as though each time you start the program, the random number generator is seeded to a fixed value: rng = RNG((uint64)-1); So each time you run the program you're...

machine-learning,classification,multilabel-classification

Actually any linear classifier has such a property by design. As I understand, what you want to do is something like feature selection without cut-off of least useful ones. See the paper Mladenić, D., Brank, J., Grobelnik, M., & Milic-Frayling, N. (2004, July). Feature selection using linear classifier weights: interaction...

amazon-web-services,machine-learning,nlp,sentiment-analysis

You can build a good machine learning model for sentiment analysis using Amazon ML. Here is a link to a github project that is doing just that: https://github.com/awslabs/machine-learning-samples/tree/master/social-media Since the Amazon ML supports supervised learning as well as text as input attribute, you need to get a sample of data...

Yes. In fact, there are infinitely many ways to reduce the dimension of the features. It's by no means clear, however, how they perform in practice. A feature reduction usually is done via a principal component analysis (PCA) which involves a singular value decomposition. It finds the directions with highest...

Principal Components do not necessarily have any correlation to classification accuracy. There could be a 2-variable situation where 99% of the variance corresponds to the first PC but that PC has no relation to the underlying classes in the data. Whereas the second PC (which only contributes to 1% of...

machine-learning,computer-vision,neural-network,deep-learning,pylearn

moving the comment to an answer; modifying my previous answer seemed wrong The full dataset may not be properly shuffled so the examples in the test set may be easier to classify. Doing the experiment again with examples redistributed among the train / valid / test subsets would show if...

algorithm,machine-learning,apache-spark,mllib

As you said the rank refers the presumed latent or hidden factors. For example, if you were measuring how much different people liked movies and tried to cross-predict them then you might have three fields: person, movie, number of stars. Now, lets say that you were omniscient and you knew...

r,machine-learning,hidden-markov-models

The problem of initialization is critical not only for HMMs and HSMMs, but for all learning methods based on a form of the Expectation-Maximization algorithm. EM converges to a local optimum in terms of likelihood between model and data, but that does not always guarantee to reach the global optimum.

python,machine-learning,scikit-learn

This is expected (or at least not so unexpected) behavior with the code you have written: you have two instances labeled as dog in which you have the term this is, so the algorithm learns that this is is related to dog. It might not be what you're after, but...

Your problem can be reproduced by using the following content for your data file: 1,1,0 A,3,1 5,5,0 Because of the if isFloat(splitData[j]) you ignore some values of your data for X. Therefore you end up with a 2D array pod in which some rows have less entries than others, resulting...

python,matlab,machine-learning,statistics,random-forest

The Problem There are many reasons why the implementation of a random forest in two different programming languages (e.g., MATLAB and Python) will yield different results. First of all, note that results of two random forests trained on the same data will never be identical by design: random forests often...

matlab,machine-learning,computer-vision,classification,matlab-cvst

Look at Database Toolbox in Matlab. You could just save the classifier variable in a file: save('classifier.mat','classifier') And then load it before executing predict: load('classifier.mat') predictedLabels = predict(classifier, testFeatures); ...

Do a one-hot encoding, if anything. If your data has categorial attributes, it is recommended to use an algorithm that can deal with such data well without the hack of encoding, e.g decision trees and random forests....

Values of features in Vowpal Wabbit can only be real numbers. If you have a categorical feature with n possible values you simply represent it as n binary features (so e.g. color=red is a name of a binary feature and its value is 1 by default). If you have a...

I don't think that it could generate class probabilities when I first added the model. I'm not sure why your version didn't work but here is what I'm adding to the package: modelInfo <- list(label = "Random Forest by Randomization", library = c("extraTrees"), loop = NULL, type = c('Regression', 'Classification'),...

machine-learning,neural-network

Neural network is fine for this. Your output would be the 10 coefficients. Comparing them "two by two" is nothing that influences the net architecture. Standard neural net training procedure takes care of "comparing the items" (if you want to call it that) itself. At last, make sure to know...

javascript,machine-learning,neural-network

Arrays in JavaScript are zero based. Therefore you have to use document.write(output[0]);. Maybe it would be helpfull to use a console.log or even better a debugger; statement. This way you can inspect your variables through the JS Console. More info on debugging can be found here....

machine-learning,neural-network,genetic-algorithm,evolutionary-algorithm

You can include as many hidden layers you want, starting from zero (--that case is called perceptron). The ability to represent unknown functions, however, does -- in principle -- not increase. Single-hidden layer neural networks already possess a universal representation property: by increasing the number of hidden neurons, they can...

java,scala,calendar,machine-learning

http://jollyday.sourceforge.net/ Apache 2 licensed project with holidays. Quality is not good now, at least for my country. But i will make it better :-)

r,machine-learning,nlp,svm,text-mining

This isn't really a programming question, but anyway: If your goal is prediction, as opposed to text classification, usual methods are backoff models (Katz Backoff) and interpolation/smoothing, e.g. Kneser-Ney smoothing. More complicated models like Random Forests are AFAIK not absolutely necessary and may pose problems if you need to make...

machine-learning,cluster-analysis,pca,eigenvalue,eigenvector

As far as I can tell, you have mixed and shuffled aa number of approaches. No wonder it doesn't work... you could simply use jaccard distance (a simple inversion of jaccard similarity) + hierachical clustering you could do MDS to project you data, then k-means (probably what you are trying...

python,machine-learning,nlp,scikit-learn,one-hot

In order to use the OneHotEncoder, you can split your documents into tokens and then map every token to an id (that is always the same for the same string). Then apply the OneHotEncoder to that list. The result is by default a sparse matrix. Example code for two simple...

python,pandas,machine-learning,data-mining

The python machine learning library scikit-learn is most appropriate in your case. There is a sub-module called feature_selection fits exactly your needs. Here is an example. from sklearn.datasets import make_regression # simulate a dataset with 500 factors, but only 5 out of them are truely # informative factors, all the...

python,syntax,machine-learning,scikit-learn

The GaussianNB() implemented in scikit-learn does not allow you to set class prior. If you read the online documentation, you see .class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different...

The difference between Latent Semantic Analysis and so-called Explicit Semantic Analysis lies in the corpus that is used and in the dimensions of the vectors that model word meaning. Latent Semantic Analysis starts from document-based word vectors, which capture the association between each word and the documents in which it...

python,machine-learning,scikit-learn

Sounds like you want to use features which are themselves multi-dimensional. I'm not sure this works. Consider the increase in complexity that would occur for a distance-based metric like KNN; multi-dimensional features would require distance metrics and would get a lot more involved. I'd first try just flattening the arrays,...

machine-learning,weka,svm,libsvm

Yes, the default kernel is RBF with gamma equal to 1/k. See other defaults in javadocs here or here. NB: Weka contains its own implementation - SMO, but it also provides wrapper for libsvm, and "LibSVM runs faster than SMO" (note that it requires installed libsvm, see docs)....

python,machine-learning,data.frame

For your specific example in column: 7: bcw = bcw[bcw[7] != '?'] However, I actually downloaded the dataset and found the same anomaly in column: 6, so this code will look through all columns for '?' and remove the rows: for col in bcw.columns: if bcw[col].dtype != 'int64': print "Removing...

numpy,machine-learning,scipy,hierarchical-clustering

You're on the right track with converting the data into a table like the one on the linked page (a redundant distance matrix). According to the documentation, you should be able to pass that directly into scipy.cluster.hierarchy.linkage or a related function, such as scipy.cluster.hierarchy.single or scipy.cluster.hierarchy.complete. The related functions explicitly...

You almost have it right. The Likelihood of a model (theta) for the observed data (X) is the probability of observing X, given theta: L(theta|X) = P(X|theta) For Maximum Likelihood Estimation (MLE), you choose the value of theta that provides the greatest value of P(X|theta). This does not necessarily mean...

python,python-2.7,numpy,pandas,machine-learning

Regarding the main question, thanks to Evert for advises I will check. Regarding #2: I found great tutorial http://www.markhneedham.com/blog/2013/11/09/python-making-scikit-learn-and-pandas-play-nice/ and achieved desired result with pandas + sklearn...

python,machine-learning,scikit-learn,decision-tree

I finally got it to work. Here is one solution based on my correspondence message in the scikit-learn mailing list: After scikit-learn version 0.16.1, apply method is implemented in clf.tree_, therefore, I followed the following steps: update scikit-learn to the latest version (0.16.1) so that you can use apply method...

python,machine-learning,scikit-learn

With scikit-learn, initialising the model, training the model and getting the predictions are seperate steps. In your case you have: train_fea = np.array([[1,1,0],[0,0,1],[1,np.nan,0]]) train_fea array([[ 1., 1., 0.], [ 0., 0., 1.], [ 1., nan, 0.]]) #initialise the model imp = Imputer(missing_values='NaN', strategy='mean', axis=0) #train the model imp.fit(train_fea) #get the...

python,machine-learning,nlp,nltk,pos-tagger

In short: NLTK is not perfect. In fact, no model is perfect. In long: Try using other tagger (see https://github.com/nltk/nltk/tree/develop/nltk/tag) , e.g.: HunPos Stanford POS Senna Using default MaxEnt POS tagger from NLTK, i.e. nltk.pos_tag: >>> from nltk import word_tokenize, pos_tag >>> text = "The quick brown fox jumps over...

machine-learning,artificial-intelligence,neural-network

You can either have NxM boolean inputs or have N inputs where each one is a float that goes from 0 to 1. In the latter case the float values would be: {A/M, B/M, C/M, ... 1}. For example if you have 4 inputs each one with discrete values: {1,2,3,4}...

As suggested in the comments, the error is because x is of dimension 3x2 and theta of dimension 1x2, so you can't do X*theta. I suspect you want: theta = [0;1]; % note the ; instead of , % theta is now of dimension 2x1 % X*theta is now a...

machine-learning,recommendation-engine,collaborative-filtering,predictionio,content-based-retrieval

If I understand correctly that you extract feature vectors for the items from users-like-items data, then it is pure item-based CF. In order to be content based filtering, features of the item itself should be used: for example, if the items are movies, content based filtering should utilize such features...

python,machine-learning,scikit-learn,random-forest

I get more than one digit in my results, are you sure it is not due to your dataset ? (for example using a very small dataset would yield to simple decision trees and so to 'simple' probabilities). Otherwise it may only be the display that shows one digit,...

java,machine-learning,artificial-intelligence,neural-network

This is the standard backpropagation algorithm where it is backpropagating the error through all the hidden layers. Unless we are in the output layer, the error for a neuron in a hidden layer is dependent on the succeeding layer. Let's assume that we have a particular neuron a with synapses...

python,machine-learning,scikit-learn

A simple yet effective idea would be to train separate classifiers for text and numeric data. Make sure you normalize as you go. Now when you have, say, two different classifiers, you can combine their results to predict whether it is a spam or not. Check http://scikit-learn.org/stable/modules/ensemble.html To further improve...

java,machine-learning,bigdata,distributed-computing

based on your four keywords : java, machine-learning, bigdata and distributed-computing I come down to conclusion that you want something like hadoop. its a perfect choice for natural processing too. then again I don't have any details of your problem, but you'll be surprised what hadoop can do. alternatviely for...

python,machine-learning,scipy,scikit-learn,pipeline

You can write your own transformer that'll transform input into predictions. Something like this: class PredictionTransformer(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin): def __init__(self, estimator): self.estimator = estimator def fit(self, X, y): self.estimator.fit(X, y) return self def transform(self, X): return self.estimator.predict_proba(X) Then you can use FeatureUnion to glue your transformers together. That said, there's a...

machine-learning,amazon,prediction,ibm-watson,predictionio

No, prediction does not only run on numerical fields. It could be anything including text. My guess is that the MovieLens data uses ID instead of actual user and movie names because this saves storage space (this dataset is there for a long time and back then storage is definitely...

python,machine-learning,scikit-learn,linear-regression,multivariate-testing

This is a mathematical/stats question, but I will try to answer it here anyway. The outcome you see is absolutely expected. A linear model like this won't take correlation between dependent variables into account. If you had only one dependent variable, your model would essentially consist of a weight vector...

math,machine-learning,neural-network,linear-algebra,perceptron

A linear function is f(x) = a x + b. If we take another linear function g(z) = c z + d, and apply g(f(x)) (which would be the equivalent of feeding the output of one linear layer as the input to the next linear layer) we get g(f(x)) =...

r,machine-learning,sparse-matrix,reshape2

A base R option would be (!!table(cbind(df1[1],stack(df1[-1])[-2])))*1L # values #ID 19 23 42 61 anxiety asthma copd diabetes female male # 1 0 0 1 0 1 1 0 0 0 1 # 2 1 0 0 0 0 1 0 0 0 1 # 3 0 1 0 0...

machine-learning,artificial-intelligence,neural-network,backpropagation

I agree with the comments that this model is probably not the best for your classification problem but if you are interested in trying to get this to work I will give you the reason I think this oscillates and the way that I would try and tackle this problem....