Is there any way to reduce the dimension of the following features from 2D coordinate (x,y) to one dimension?

Tag: machine-learning

Is there any way to reduce the dimension of the following features from 2D coordinate (x,y) to one dimension?

Yes. In fact, there are infinitely many ways to *reduce the dimension* of the features. It's by no means clear, however, how they perform in practice.

A feature reduction usually is done via a principal component analysis (PCA) which involves a singular value decomposition. It finds the directions with highest variance -- that is, those direction in which "something is going on".

In your case, a PCA might find the black line as one of the two principal components:

One can see that the three feature sets can roughly be separated according to their projection onto this line. Thus, one can attribute ranges on this line with the affiliation to one feature set -- see the coloured ranges in the picture. (Note that in case of your example, it is even possible to completely separate the data sets.)

So, yes, in case of your example it could be possible to make such a strong reduction from 2D to 1D, and, at the same time, even obtain a reasonable model.

java,machine-learning,artificial-intelligence,neural-network

This is the standard backpropagation algorithm where it is backpropagating the error through all the hidden layers. Unless we are in the output layer, the error for a neuron in a hidden layer is dependent on the succeeding layer. Let's assume that we have a particular neuron a with synapses...

machine-learning,neural-network,genetic-algorithm,evolutionary-algorithm

You can include as many hidden layers you want, starting from zero (--that case is called perceptron). The ability to represent unknown functions, however, does -- in principle -- not increase. Single-hidden layer neural networks already possess a universal representation property: by increasing the number of hidden neurons, they can...

machine-learning,neural-network

If you are talking about session-based course (which I have passed previously): https://www.coursera.org/learn/machine-learning than it uses a batch-learning approach, in exercise 4 (which covers ANN). If you carefully study the cost function you will see that it is calculated using all of available examples, not just one randomly chosen....

I don't think that it could generate class probabilities when I first added the model. I'm not sure why your version didn't work but here is what I'm adding to the package: modelInfo <- list(label = "Random Forest by Randomization", library = c("extraTrees"), loop = NULL, type = c('Regression', 'Classification'),...

Do a one-hot encoding, if anything. If your data has categorial attributes, it is recommended to use an algorithm that can deal with such data well without the hack of encoding, e.g decision trees and random forests....

r,machine-learning,classification,regression,caret

Look at the help page ?models. Also, here are some links too. Also: > is_class <- unlist(lapply(mods, function(x) any(x$type == "Classification"))) > class_mods <- names(is_class)[is_class] > head(class_mods) [1] "ada" "AdaBag" "AdaBoost.M1" "amdai" "avNNet" [6] "bag" ...

On systems supporting /dev/stdout (and /dev/stderr), you may try this: vw -t -i model.vw --daemon --port 26542 --link=logistic -r /dev/stdout The daemon will write raw predictions into standard output which in this case end up in the same place as localhost port 26542. The relative order of lines is guaranteed...

python,machine-learning,nlp,nltk,pos-tagger

In short: NLTK is not perfect. In fact, no model is perfect. In long: Try using other tagger (see https://github.com/nltk/nltk/tree/develop/nltk/tag) , e.g.: HunPos Stanford POS Senna Using default MaxEnt POS tagger from NLTK, i.e. nltk.pos_tag: >>> from nltk import word_tokenize, pos_tag >>> text = "The quick brown fox jumps over...

machine-learning,cross-validation

T-test is a type of statistical test on your data. Say you are comparing two datasets and you want to know if the two data sets are significantly different from each other. Then you will do a t-test. Cross validation is more of a technique for evaluation your models. Typically...

java,machine-learning,bigdata,distributed-computing

based on your four keywords : java, machine-learning, bigdata and distributed-computing I come down to conclusion that you want something like hadoop. its a perfect choice for natural processing too. then again I don't have any details of your problem, but you'll be surprised what hadoop can do. alternatviely for...

python,numpy,encoding,machine-learning,scikit-learn

If I understand your question correctly, then it is simply quadratic programming - all the constraints you mentioned (both equalities and inequalities) are linear.

python,machine-learning,data.frame

For your specific example in column: 7: bcw = bcw[bcw[7] != '?'] However, I actually downloaded the dataset and found the same anomaly in column: 6, so this code will look through all columns for '?' and remove the rows: for col in bcw.columns: if bcw[col].dtype != 'int64': print "Removing...

python,machine-learning,scikit-learn,random-forest

I get more than one digit in my results, are you sure it is not due to your dataset ? (for example using a very small dataset would yield to simple decision trees and so to 'simple' probabilities). Otherwise it may only be the display that shows one digit,...

python,python-2.7,numpy,pandas,machine-learning

Regarding the main question, thanks to Evert for advises I will check. Regarding #2: I found great tutorial http://www.markhneedham.com/blog/2013/11/09/python-making-scikit-learn-and-pandas-play-nice/ and achieved desired result with pandas + sklearn...

Values of features in Vowpal Wabbit can only be real numbers. If you have a categorical feature with n possible values you simply represent it as n binary features (so e.g. color=red is a name of a binary feature and its value is 1 by default). If you have a...

machine-learning,recommendation-engine,collaborative-filtering,predictionio,content-based-retrieval

If I understand correctly that you extract feature vectors for the items from users-like-items data, then it is pure item-based CF. In order to be content based filtering, features of the item itself should be used: for example, if the items are movies, content based filtering should utilize such features...

Yes. In fact, there are infinitely many ways to reduce the dimension of the features. It's by no means clear, however, how they perform in practice. A feature reduction usually is done via a principal component analysis (PCA) which involves a singular value decomposition. It finds the directions with highest...

algorithm,machine-learning,apache-spark,mllib

As you said the rank refers the presumed latent or hidden factors. For example, if you were measuring how much different people liked movies and tried to cross-predict them then you might have three fields: person, movie, number of stars. Now, lets say that you were omniscient and you knew...

As suggested in the comments, the error is because x is of dimension 3x2 and theta of dimension 1x2, so you can't do X*theta. I suspect you want: theta = [0;1]; % note the ; instead of , % theta is now of dimension 2x1 % X*theta is now a...

machine-learning,nlp,scikit-learn,svm,confusion-matrix

Classification report must be straightforward - a report of P/R/F-Measure for each element in your test data. In Multiclass problems, it is not a good idea to read Precision/Recall and F-Measure over the whole data any imbalance would make you feel you've reached better results. That's where such reports help....

machine-learning,scikit-learn,classification,weka,libsvm

You can look at RandomForest which is a well known classifier and quite efficient. In scikit-learn, you have some class that can be used over several core like RandomForestClassifier. It has a constructor parameter that can be used to define the number of core or a value that will use...

machine-learning,probability,mle,language-model

The likelihood function describes the probability of generating a set of training data given some parameters and can be used to find those parameters which generate the training data with maximum probability. You can create the likelihood function for a subset of the training data, but that wouldn't be represent...

python,c,machine-learning,word2vec

There are a number of opportunities to create Word2Vec models at scale. As you pointed out, candidate solutions are distributed (and/or multi-threaded) or GPU. This is not an exhaustive list but hopefully you get some ideas as to how to proceed. Distributed / Multi-threading options: Gensim uses Cython where it...

python,syntax,machine-learning,scikit-learn

The GaussianNB() implemented in scikit-learn does not allow you to set class prior. If you read the online documentation, you see .class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different...

Sure, it already exists. Please check class Vote.

javascript,machine-learning,neural-network

Arrays in JavaScript are zero based. Therefore you have to use document.write(output[0]);. Maybe it would be helpfull to use a console.log or even better a debugger; statement. This way you can inspect your variables through the JS Console. More info on debugging can be found here....

Principal Components do not necessarily have any correlation to classification accuracy. There could be a 2-variable situation where 99% of the variance corresponds to the first PC but that PC has no relation to the underlying classes in the data. Whereas the second PC (which only contributes to 1% of...

machine-learning,normalization

Further, you plan to use both feature scaling (dividing by the "max-min", or range, of a feature) and mean normalization. So for any individual feature f: f_norm = (f - f_mean) / (f_max - f_min) e.g. for x2,(midterm exam)^2 = {7921, 5184, 8836, 4761} > x2 <- c(7921, 5184,...

machine-learning,neural-network,backpropagation,feed-forward

In short, yes it is a good approach to use a single network with multiple outputs. The first hidden layer describes decision boundaries (hyperplanes) in your feature space and multiple digits can benefit from some of the same hyperplanes. While you could create one ANN for each digit, that kind...

machine-learning,cluster-analysis,pca,eigenvalue,eigenvector

As far as I can tell, you have mixed and shuffled aa number of approaches. No wonder it doesn't work... you could simply use jaccard distance (a simple inversion of jaccard similarity) + hierachical clustering you could do MDS to project you data, then k-means (probably what you are trying...

math,machine-learning,neural-network,linear-algebra,perceptron

A linear function is f(x) = a x + b. If we take another linear function g(z) = c z + d, and apply g(f(x)) (which would be the equivalent of feeding the output of one linear layer as the input to the next linear layer) we get g(f(x)) =...

image-processing,machine-learning,computer-vision

Test, Train and Validate Read this stats SE question: What is the difference between test set and validation set? This is basic machine learning, so you should probably go back and review your course literature, since it seems like you're missing some pretty important machine learning concepts. Do we...

machine-learning,statistics,classification,multilabel-classification

There are several available metrics, described in the following paper: Sokolova, Marina, and Guy Lapalme. "A systematic analysis of performance measures for classification tasks." Information Processing & Management 45.4 (2009): 427-437. See Table 3 on page 4 (430) - it contains brief description and formula for 8 metrics; choose the...

java,machine-learning,neural-network

AI is being set to the output value from the leftNeuron of the previous connection (whatever node that is connecting to the current one). The way the back propagation algorithm works is by going through every layer in the ANN, and every node in it, then summing up all of...

machine-learning,weka,svm,libsvm

Yes, the default kernel is RBF with gamma equal to 1/k. See other defaults in javadocs here or here. NB: Weka contains its own implementation - SMO, but it also provides wrapper for libsvm, and "LibSVM runs faster than SMO" (note that it requires installed libsvm, see docs)....

machine-learning,amazon,prediction,ibm-watson,predictionio

No, prediction does not only run on numerical fields. It could be anything including text. My guess is that the MovieLens data uses ID instead of actual user and movie names because this saves storage space (this dataset is there for a long time and back then storage is definitely...

machine-learning,pattern-recognition,bayesian-networks

Is this the right approach? There's many possible approaches, but here's a very simple and effective one that fits the domain: Given the nature of the application, chronological order doesn't really matter, it doesn't matter if the Fan gets turned on before the Light e.g. Also given that you...

python,pandas,machine-learning,data-mining

The python machine learning library scikit-learn is most appropriate in your case. There is a sub-module called feature_selection fits exactly your needs. Here is an example. from sklearn.datasets import make_regression # simulate a dataset with 500 factors, but only 5 out of them are truely # informative factors, all the...

python,machine-learning,scikit-learn

With scikit-learn, initialising the model, training the model and getting the predictions are seperate steps. In your case you have: train_fea = np.array([[1,1,0],[0,0,1],[1,np.nan,0]]) train_fea array([[ 1., 1., 0.], [ 0., 0., 1.], [ 1., nan, 0.]]) #initialise the model imp = Imputer(missing_values='NaN', strategy='mean', axis=0) #train the model imp.fit(train_fea) #get the...

matlab,machine-learning,computer-vision,classification,matlab-cvst

Look at Database Toolbox in Matlab. You could just save the classifier variable in a file: save('classifier.mat','classifier') And then load it before executing predict: load('classifier.mat') predictedLabels = predict(classifier, testFeatures); ...

amazon-web-services,machine-learning,nlp,sentiment-analysis

You can build a good machine learning model for sentiment analysis using Amazon ML. Here is a link to a github project that is doing just that: https://github.com/awslabs/machine-learning-samples/tree/master/social-media Since the Amazon ML supports supervised learning as well as text as input attribute, you need to get a sample of data...

machine-learning,cluster-analysis,k-means,hierarchical-clustering

The idea I'm suggesting is originated in text-processing, NLP and information retrieval and very widely used in situations where you have sequences of characters/information like Genetic information. Because you have to preserve the sequence, we can use the concepts of n-grams. I'm using bi-grams in the following example, though you...

machine-learning,classification,multilabel-classification

Actually any linear classifier has such a property by design. As I understand, what you want to do is something like feature selection without cut-off of least useful ones. See the paper Mladenić, D., Brank, J., Grobelnik, M., & Milic-Frayling, N. (2004, July). Feature selection using linear classifier weights: interaction...

java,optimization,machine-learning,scipy,stanford-nlp

What you have should be just fine. (Have you actually had any problems with it?) Setting termination both on max iterations and max function evaluations is probably overkill, so you might omit the last argument to qn.minimize(), but it seems from the documentation that scipy does use both with a...

machine-learning,gradient-descent

Mathematically, we are trying here to minimise error function Error(θ) = Σ(yi - h(xi))^2 summation over i. To minimise error, we do d(Error(θ))/dθi = Zero putting h(xi) = Σ(θi*xi) summation over i and derive the above formula. The rest of the formulation can be reasoned as Gradient descent uses the...

machine-learning,artificial-intelligence,neural-network

You can either have NxM boolean inputs or have N inputs where each one is a float that goes from 0 to 1. In the latter case the float values would be: {A/M, B/M, C/M, ... 1}. For example if you have 4 inputs each one with discrete values: {1,2,3,4}...

machine-learning,statistics,linear-regression

So Linear Regression assumes your data is linear even in multiple dimensions. It wont be possible to visualize high dimensional data unless you use some methods to reduce the high dimensional data. PCA can do that but bringing it down to 2 dimensions won't be helpful. You should do Cross...

c++,opencv,machine-learning,neural-network,weight

I've only done a little bit of poking around so far, but what I've seen confirms my first suspicion... It looks as though each time you start the program, the random number generator is seeded to a fixed value: rng = RNG((uint64)-1); So each time you run the program you're...

One thing you can do is calculate correlation between rows. Take a look at the tutorial about summary statistics at mllib website. More advanced approach would be use dimensionality reduction. This should discover more complex dependencies....

You almost have it right. The Likelihood of a model (theta) for the observed data (X) is the probability of observing X, given theta: L(theta|X) = P(X|theta) For Maximum Likelihood Estimation (MLE), you choose the value of theta that provides the greatest value of P(X|theta). This does not necessarily mean...