mongodb,indexing,unique-constraint,sparse

ensureIndex creates an index on the specified field if the index does not already exist. If you want to change an index, you have to drop the index first and then call ensureIndex again with your new options. collection.User.dropIndex("username_1"); collection.User.ensureIndex({username:1}, {unique: true, sparse:true}) Taken from the mongodb documentation: To add...

r,matrix,username,elements,sparse

You can use sparseMatrix from the Matrix package: require(Matrix) #this just to generate some random strings require(stringi) set.seed(1) #generating 100k usernames users<-stri_rand_strings(100000,6) #simulating col1 and col2 col1<-sample(users,1000000,T) col2<-sample(users,1000000,T) #hashing to integer values through factor col1<-factor(col1,levels=users) col2<-factor(col2,levels=users) #creating the matrix mySparseMatrix<-sparseMatrix(as.numeric(col1),as.numeric(col2),x=1) #not a huge object...

Use mtx.dot instead of np.dot as blured = mtx.dot(img) or just blured = mtx * img # where mtx is sparse and img is `dense` or `sparse` Two parameters of np.dot is dealt with dense even though one of them is sparse. So, this will raise MemoryError...

matlab,matrix,sparse-matrix,sparse

As @beaker mentioned above, and as I explain in my Accelerating Matlab Performance book, using the sparse or spdiags functions is much faster than using indexed assignments as in your code. The reason is that any such assignment results in a rebuilding of the sparse data's internal representation, which is...

matrix,permutation,eigen,sparse

If you want to apply a symmetric permutation P * Y * P^-1, then the best is to use the twistedBy method: SpMat z = y.twistedBy(perm); otherwise you have to apply one and then the other: SpMAt z = (perm * y).eval() * perm; ...

sparse-matrix,sparse,outliers,elki

ArrayAdapterDatabaseConnection is designed for dense data. For sparse data, it does not make much sense to first encode it into a dense array, then re-encode it into sparse vectors. Consider reading the data as sparse vectors directly to avoid overhead. The error you are seeing has a different reason, though:...

matlab,sparse-matrix,sparse,adjacency-matrix

Use the spdiags function to convert the degree vector to a sparse diagonal matrix. Then subtract the adjacency matrix from diagonal matrix to get the Laplacian. Example using your code: adj = spconvert(adj); for i=1:size(adj, 1) degree(i) = CalcDegree(adj, i) end D = spdiags(degree, 0, size(adj, 1), size(adj, 2)); L...

As it says in Scipy's documentation the lil_matrix only supports instantiation by passing a dense or sparse matrix, or by giving the desired shape (resulting in an empty matrix). One of the main reasons I see that lil_matrix doesn't support this form of instantiation is that the column count will...

python,scipy,matrix-multiplication,sparse

I suspect that your sparse matrices are becoming non sparse when you perform the operation have you tried just: A.multiply(B) As I suspect that it will be better optimised than anything that you can easily do. If A is not already the correct type of sparse matrix you might need:...

You can use scipy.io.mmread which does exactly what you want. In [11]: mmread("sparse_from_file") Out[11]: <4589x17366 sparse matrix of type '<class 'numpy.float64'>' with 7 stored elements in COOrdinate format> Note the result is a COO sparse matrix. If you want a csc_matrix you can then use sparse.coo_matrix.tocsc. Now you mention you...

matlab,extract,sparse-matrix,extraction,sparse

[ii jj] = find(A); answer = unique([ii(:); jj(:)]); should do it. Note that the find command with two outputs gives you the row and column index of all nonzero elements. Since you have a minimum spanning tree, each number you care about needs to occur at least once in the...

python,numpy,scipy,duplicates,sparse

Creating an intermediary dok matrix works in your example: In [410]: c=sparse.coo_matrix((data, (cols, rows)),shape=(3,3)).todok().tocsc() In [411]: c.A Out[411]: array([[0, 0, 0], [0, 4, 0], [0, 0, 0]], dtype=int32) A coo matrix puts your input arrays into its data,col,row attributes without change. The summing doesn't occur until it is converted to...

numpy,scipy,linear-algebra,sparse

Without knowing the exact error, it's hard to say what's going wrong. I'm not overly familiar with scipy, but I suspect if there was no solution to these problems due to an inconsistent system, you would get a meaningful error. My best guess would be a memory issue. During Gaussian...

python,performance,numpy,scipy,sparse

Perhaps pandas is what you're looking for: d1 = pandas.DataFrame(numpy.array([1, 4]), index=['a', 'b'], dtype="int32") d2 = pandas.DataFrame(numpy.array([2, 2]), index=['a', 'c'], dtype="int32") d1.add(d2, fill_value=0) result: 0 a 3 b 4 c 2 ...

When you use [] to remove values, you are changing the size of the matrix. But in your inner loop you make index_i run up to the initial maximum size of temp. Therefore you are likely reaching a point where index_i is larger than the current size of the temp,...

python,numpy,scipy,matrix-multiplication,sparse

Your question initially confused me, since for my version of scipy, A.dot(B) and np.dot(A, B) both work fine; the .dot method of the sparse matrix simply overrides np.dot. However it seems that this feature was added in this pull request, and is not present in versions of scipy older than...

When dealing with a sparse matrix, s, avoid inequalities that include zero since a sparse matrix (if you're using it appropriately) should have a great many zeros and forming an array of all the locations which are zero would be huge. So avoid s <= 2 for example. Use inequalities...

I think your problem is due to mixing of dtypes. But you could get around it like this. First, provide only the relevant column to get_dummies() rather than the whole dataframe: df2 = pd.get_dummies(df['cat']).to_sparse(0) After that, you can add other variables back but everything needs to be numeric. A pandas...

python,python-2.7,numpy,scipy,sparse

In the CSR format, the underlying data, indices, and indptr arrays for your desired y are identical to those of your x matrix. You can pass those to the csr_matrix constructor with a new shape: y = csr_matrix((x.data, x.indices, x.indptr), shape=(2, 3)) Note that the constructor defaults to copy=False, so...

matrix,cuda,multiplication,sparse,cublas

I don't think that you can classify a matrix with half zeros as "sparse": the timing you have found are reasonable (actually the sparse algorithm is behaving pretty well!). Sparse algorithms are efficient only when considering matrices where most of the elements are zeros (for example, matrices coming out from...

python,matrix,scipy,sparse-matrix,sparse

It's not entirely clear what you are asking for, but here's my guess. Let's just experiment with a simple array: Start with 3 arrays (I took these from another sparse matrix, but that isn't important): In [165]: data Out[165]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,...

optimization,sparse-matrix,julia-lang,sparse,sparse-array

First, you're making term_doc a global variable, which is a big problem for performance. Pass it as an argument, doSparseWay(term_doc::SparseMatrixCSC). (The type annotation at the beginning of your function does not do anything useful.) You want to use an approach similar to the answer by walnuss: function doSparseWay(term_doc::SparseMatrixCSC) I, J,...

A couple of comments with your code: numel(find(G == 0)) is probably one of the worst ways to determine how many entries that are zero in your matrix. I would personally do numel(G) - nnz(G). numel(G) determines how many elements are in G and nnz(G) determines how many non-zero values...

The sparsity parameter helps you to removes those terms which have at least a certain percentage of sparse elements. (very) Roughly speaking if you want to keep the terms that appear 3% of of the time, set the parameter to 0.97. If you want the terms that occur in 70%...

You can save in one of numpy's binary formats, here's one I use: np.savez. You can average with np.sum(a, axis=2) / np.sum(a != 0, axis=2). Keep in mind that this will still give you NaN's when there are zeros in the denominator....

numpy,sparse-matrix,sparse,sparse-array

A restriction of scipy.sparse matrices is that they represent linear operators and are thus kept two dimensional, which leads to the question: Which operation are you seeking to do? All einsum operations on a pair of 2D matrices are very easy to write without einsum using dot, transpose and pointwise...

matrix,julia-lang,complex-numbers,sparse

Looking at what we can pass to spzeros: julia> methods(spzeros) # 5 methods for generic function "spzeros": spzeros(m::Integer,n::Integer) at sparse/sparsematrix.jl:406 spzeros(Tv::Type{T<:Top},m::Integer,n::Integer) at sparse/sparsematrix.jl:407 spzeros(Tv::Type{T<:Top},Ti::Type{T<:Top},m::Integer,n::Integer) at sparse/sparsematrix.jl:409 spzeros(m::Integer) at deprecated.jl:28 spzeros(Tv::Type{T<:Top},m::Integer) at deprecated.jl:28 We see we should be able to pass a type as the first argument: julia> a =...

Thanks for having clarified your question, try this. Here is sample data with two columns that have three and two levels respectively: set.seed(123) n <- 6 df <- data.frame(x = sample(c("A", "B", "C"), n, TRUE), y = sample(c("D", "E"), n, TRUE)) # x y # 1 A E # 2...

python,scipy,scikit-learn,distance,sparse

CSR is ordered by rows, CSC is ordered by columns. So accessing rows would be faster with CSR and accessing columns would be faster using CSC. Since sklearn.metrics.pairwise.pairwise_distances uses as input, X, where the rows are instances and columns are attributes, it will be accessing rows from the sparse matrix....

python,numpy,matrix,scipy,sparse

If the indices are increasing (as appears to be from your example), you could use itertools.groupby on an enumerate of the list. For each group, use numpy's indexing. The loop could look like this: import itertools import operator for g, inds in itertools.groupby(enumerate(A), key=operator.itemgetter(1)): ... and the ... should be...

As maintainer of the Matrix package: Using dimnames for sparseMatrix objects is allowed in construction, and for column names even of importance, notably e.g. for sparse model matrices (in glmnet etc). but for efficiency reasons (and partly lack of use cases and hence "not yet implemented") they are not always...

matlab,matrix,out-of-memory,normalization,sparse

Yey for linear algebra! Column scaling is right multiplication of diagonal matrix: X = X*diag(sparse(fac)); ...

scipy,linear-algebra,sparse,eigenvalue

Both eigs and eigsh require that M be positive definite (see the descriptions of M in the docstrings for more details). Your matrix M is not positive definite. Note the negative eigenvalues: In [212]: M Out[212]: array([[ 25.1, 0. , 0. , 17.3, 0. , 0. ], [ 0. ,...