python,statistics,scipy,normal-distribution,cdf

edit: you actually need import norm from scipy.stats. I found the answer. You need to use ppf in scipy.stats which stands for "percent point function". So let's say you have a normal distribution with stdDev = 1, and mean = 0 and you want to find the value at which...

Have you looked at the underlying data representation for this, and other sparse formats? For example, for small matrix In [257]: M = sparse.rand(10,10,.1,format='csr') In [258]: M Out[258]: <10x10 sparse matrix of type '<class 'numpy.float64'>' with 10 stored elements in Compressed Sparse Row format> In [259]: M.data Out[259]: array([ 0.86390256,...

python-3.x,scipy,pycharm,anaconda

Explanations how to configure PyCharm with Anaconda can be found in the documentation. In PyCharm preferences you can just select the correct python interpreter under, Project Interpreter > Python Interpreters As pointed out by @Cecilia, in the case when a virtual environment (e.g. named py3k) is used with Anaconda, the...

python,numpy,matplotlib,scipy,least-squares

According to the documentation of the function scipy.linalg.lstsq http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.linalg.lstsq.html the estimated coefficients should be stored in your variable C (the order corresponding to columns in A). To print your equation with estimated coefficients showing 2 digits after decimal point: print 'f(x,y) = {:.2f}x^2+{:.2f}y^2+{:.2f}xy+{:.2f}x+{:.2f}y+{:.2f}'.format(C[4],C[5],C[3],C[1],C[2],C[0]) or: print 'f(x,y) = {4:.2f}x^2+{5:.2f}y^2+{3:.2f}xy+{1:.2f}x+{2:.2f}y+{0:.2f}'.format(*C)...

In my computer your computations take only 0.12 seconds: In [59]: %prun quad(lambda x: x*fn(x), -100,100) 26740 function calls in 0.114 seconds And I get the same result: In [59]: quad(lambda x: x*fn(x), -100,100) Out[59]: (0.9999999999999999, 5.793858187044747e-12) Most of it is due to the fact that fn is a bounded...

From the docs for coo_matrix: | Intended Usage | - COO is a fast format for constructing sparse matrices | - Once a matrix has been constructed, convert to CSR or | CSC format for fast arithmetic and matrix vector operations | - By default when converting to CSR or...

Mabye: import numpy as np from matplotlib import pyplot as plt from scipy import ndimage from scipy import optimize %matplotlib inline # just a gaussian (copy paste from lmfit, another great package) def my_gaussian(p,x): amp = p[0] cen = p[1] wid = p[2] return amp * np.exp(-(x-cen)**2 /wid) # I...

np.fromfile takes a "count" argument, which specifies how many items to read. If you know the number of integers in the header in advance, a simple way to do what you want without any type conversions would just be to read the header as integers, followed by the rest of...

python,numpy,scipy,sparse-matrix,eigenvalue

tl;dr: You can use the which='LA' flag as described in the documentation. I quote: scipy.sparse.linalg.eigsh(A, k=6, M=None, sigma=None, which='LM', v0=None, ncv=None, maxiter=None, tol=0, return_eigenvectors=True, Minv=None, OPinv=None, mode='normal') Emphasis mine. which : str [‘LM’ | ‘SM’ | ‘LA’ | ‘SA’ | ‘BE’] If A is a complex hermitian matrix, ‘BE’ is...

The obvious thing to do is remove the NaNs from data. Doing so, however, also requires that the corresponding positions in the 2D X, Y location arrays also be removed: X, Y = np.indices(data.shape) mask = ~np.isnan(data) x = X[mask] y = Y[mask] data = data[mask] Now you can use...

Use scipy's argrelextrema: >>> import numpy as np >>> from scipy.signal import argrelextrema >>> data = np.array([ 5, 12.3, 12.3, 7, 2, 6, 9, 10, 5, 9, 17, 2 ]) >>> radius = 2 # number of elements to the left and right to compare to >>> argrelextrema(data, np.less, order=radius)...

python,for-loop,numpy,matrix,scipy

eval_hermite broadcasts n with x, then evaluates Hn(x) at each point. Thus, the output shape will be the result of broadcasting n with x. So, if you want to make this work, you'll have to make n and x have compatible shapes: import scipy.special as ss import numpy as np...

This syntax worked with RectBivariateSpline x2 = np.linspace(xmin,xmax,c1) y2 = np.linspace(ymin,ymax,r1) f2 = sp.interpolate.RectBivariateSpline(x2, y2, dstnc1.T,kx=1, ky=1) I can then evaluate at new points with this: out2 = f2.ev(xnew1,ynew1) For interp2d I am stuck as I am not able to bypass firewall at my office to update Anaconda (Windows). I...

python,image-processing,numpy,scipy

You could simply try a median filter with a small kernel size, from scipy.ndimage import median_filter filtered_array = median_filter(random_array, size=3) which will remove the specks without noticeably changing the original image. A median filter is well suited for such tasks since it will better preserve features in your original image...

Yes. >>> len(set(numpy.roots([1, 6, 9]))) 2 >>> numpy.roots([1, 6, 9]) array([-3. +3.72529030e-08j, -3. -3.72529030e-08j]) ...

Without testing myself (on a tablet right now without a Python shell handy), I believe this is due to somewhat strange behavior related to the starting point for the initialization used by the approximate eigensolver library ARPACK, which is what svds ends up calling. If you follow through the Python...

fsolve is a wrapper of MINPACK's hybrd, which requires the function's argument and output have the same number of elements. You can try other algorithms from the more general scipy.optimize.root that do not have this restriction (e.g. lm): from scipy.optimize import fsolve, root def fsolve_function(arguments): x = arguments[0] y =...

From a programming point of view, the problem appears to be your value of gamma and therefore the size of your collapse operators. Print out gamma - it is of the order 10**25 - this seems to be what is preventing the solver from converging. Just for testing (I'm an...

I believe data would be of type - numpy.ndarray You can do type(data) it should comes out as numpy.ndarray . Also , scipy.random.randint() also returns a value of type numpy.ndarray . You may not be able to do lst[[1,2]] , but you can use numpy.ndarray as a subscript to another...

I might miss something but I think the curve_fit just works fine. When I compare the residuals obtained by curve_fit to the ones one would obtain using the parameters obtained by excel which you provide in the comments, the python results always lead to lower residuals (code is provided below)....

Here are two possible ways: The first version uses fmin_cobyla and therefore does not require the derivative of f. from scipy.optimize import fmin_cobyla f = lambda x : - (x[0]**0.5 * x[1]**(0.5)) # x + y = 10 <=> (x + y - 10 >= 0) & (-x -y +...

np.einsum would do it: np.einsum('ij,ij->i', a_vec, b_vec) ...

python,scipy,linear-programming

Here is a wrapper that incorporates lower bound rows in the lingprog formulation. Note that more error trapping is necessary (for example, the number of columns of each A matrix need to be equal), this is not meant to be a robust implementation. For proper error trapping, I suggest you...

python,numpy,scipy,nonlinear-optimization

As @Jblasco suggested, you can minimize the sum of squares. scipy.leastsq() is designed for such problems. For your example, the code would be: import scipy.optimize as sopt xx0 = np.array([0., 0., 0.]) # starting point rslt = sopt.leastsq(fun, xx0, full_output=True) print("The solution is {}".format(rslt[0])) Look at the other entries of...

python,numpy,scipy,bioinformatics

From the Wikipedia article on the definition of matrix logarithm: A real matrix has a real logarithm if and only if it is invertible and each Jordan block belonging to a negative eigenvalue occurs an even number of times. We'll need access to your matrix so we can compute the...

python,arrays,matplotlib,scipy,scikit-learn

As @wflynny says above, you need to give np.meshgrid two 1D arrays. We can use X.shape to create your x and y arrays, like this: X=np.zeros((100,85)) # just to get the right shape here print X.shape # (100, 85) x=np.arange(X.shape[0]) y=np.arange(X.shape[1]) print x.shape # (100,) print y.shape # (85,) xx,yy=np.meshgrid(x,y,indexing='ij')...

python,arrays,scipy,vectorization,linear-algebra

You should just divide your array by the sqrt of the sum of squares of your array's last dimension. In [1]: import numpy as np In [2]: x = np.random.rand(1000, 500, 3) In [3]: normed = x / np.sqrt((x**2).sum(axis=-1))[:,:,None] #None could be np.newaxis Note that if you want to compute...

python,scipy,edge-detection,sobel

This is a difficult image to apply simple edge detection due to the stone and concrete textures. The texture makes it almost as though you have a very noisy image to which you are applying first derivative. You'll end up with many small undesired edges. Here is your code working...

Can we assume they are the same shape? In [202]: a=sparse.csr_matrix([[0,1],[1,0]]) In [203]: b=sparse.csr_matrix([[0,1],[1,1]]) In [204]: (a!=b).nnz==0 Out[204]: False This checks the sparsity of the inequality array. It will give you an efficiency warning if you try a==b (at least the 1st time you use it). That's because all those...

python,numpy,scipy,distribution

The loc parameter always shifts the x variable. In other words, it generalizes the distribution to allow shifting x=0 to x=loc. So that when loc is nonzero, maxwell.pdf(x) = sqrt(2/pi)x**2 * exp(-x**2/2), for x > 0 becomes maxwell.pdf(x, loc) = sqrt(2/pi)(x-loc)**2 * exp(-(x-loc)**2/2), for x > loc. The doc string...

python,statistics,scipy,p-value

The degrees of freedom you are passing to the formula are negative. In [6]: import numpy as np from scipy.special import stdtr dof = -2176568 tf = -11.374250 2*stdtr(dof, -np.abs(tf)) Out[6]: nan If positive: In [7]: import numpy as np from scipy.special import stdtr dof = 2176568 tf...

python,scipy,curve-fitting,data-fitting

This is very well within reach of scipy.optimize.curve_fit (or just scipy.optimize.leastsqr). The fact that a sum is involved does not matter at all, nor that you have arrays of parameters. The only thing to note is that curve_fit wants to give your fit function the parameters as individual arguments, while...

I assume you want to have a 5 by 5 matrix at the end. also indices start from 0. In [18]:import scipy.sparse as sp In [20]: a = [[0,1,2],[0],[0,3,4]] In [31]: m = sp.lil_matrix((5,5), dtype=int) In [32]: for row_index, col_indices in enumerate(a): m[row_index, col_indices] = 1 ....: In [33]: m.toarray()...

python,numpy,matplotlib,scipy,voronoi

The code in the guide can be used to acheive this. http://docs.scipy.org/doc/scipy/reference/tutorial/spatial.html I haven't yet had time to read through and understand all the code, but it "manually" does what I want and it works. Instead of using voronoi_plot_2d(vor) It step by step uses the different parts of vor to...

signal.argrelmin is a thin wrapper around signal.argrelextrema with comparator=np.less. np.less(a, b) returns the truth value of a < b element-wise. Notice that np.less requires a to be strictly less than b for it to be True. Your data has the same minimum value at a lot of neighboring locations. At...

python,arrays,numpy,data-structures,scipy

Well, here are several ways for achieving this: Map Using Python's map built-in function you can do that easily. animal_list = ['cat', 'elephant'] your_list = ['dog', 'horse', 'cat', 'shark', 'cancer', 'elephant'] res = map(lambda item: item in animal_list, your_list) print res Output [False, False, True, False, False, True] List comprehension...

This solution really focuses on readability over performance - It explicitly calculates and stores the whole n x n distance matrix and therefore cannot be considered efficient. But: It is very concise and readable. import numpy as np from scipy.spatial.distance import pdist, squareform #create n x d matrix (n=observations, d=dimensions)...

python,machine-learning,scipy,scikit-learn,pipeline

You can write your own transformer that'll transform input into predictions. Something like this: class PredictionTransformer(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin): def __init__(self, estimator): self.estimator = estimator def fit(self, X, y): self.estimator.fit(X, y) return self def transform(self, X): return self.estimator.predict_proba(X) Then you can use FeatureUnion to glue your transformers together. That said, there's a...

The problem is that your data set is not centered. Qhull (used to do the Delaunay triangulation) does not center the data set for you under the default options, so it runs to rounding errors far away from origin. You can center it yourself before triangulation points -= points.mean(axis=0) tri...

According to the documentation, numpy.correlate was designed for 1D arrays, while scipy.correlate can accept ND-arrays. The scipy implementation being more general and therefore complex, seem indeed to incur an additional computational overhead. You can compare the C code between numpy and scipy implementations. Another difference, could be for instance, that...

arrays,vector,scipy,vectorization,linear-algebra

You could use the meshgrid function from numpy: import numpy as np M=10 N=10 D=1 ux=0.5 uy=0.5 xo=1 yo=1 A=np.empty((M,N,3)) x=range(M) y=range(N) xv, yv = np.meshgrid(x, y, sparse=False, indexing='ij') A[:,:,0]=D*ux - (xv-xo) A[:,:,1]=D*uy - (yv-yo) A[:,:,2]=D ...

In the mstats module of scipy.stats, "missing values" are handled using a masked array. nan does not indicate a missing value. The following shows how you can convert your array y (which uses nan for missing values) into a masked array my: In [48]: x = np.arange(12) In [49]: y...

You're using python wrong. Before you create any top-level python module or package, you should make sure there isn't already a module or package by that name. The best solution here is to not use top-level modules, and instead put everything in a single top-level package (i.e. directory with an...

Here is a way to do it (from networkx.pagerank_scipy). It uses scipy linear algebra functions instead of iterating over each row. That will probably be faster for large graphs. In [42]: G=nx.random_geometric_graph(5,0.5) In [43]: M=nx.to_scipy_sparse_matrix(G, nodelist=G.nodes(), dtype=float) In [44]: M.todense() Out[44]: matrix([[ 0., 1., 0., 1., 1.], [ 1., 0.,...

python,scipy,scikit-learn,atlas

Thanks to Andreas Mueller for the tip: doing a local install of anaconda to a directory I owned got around the compilation issues.

Your data contains several y values for the same x value. This violates the assumptions of most interpolation algorithms. Either discard the rows with duplicate x values, average the y values for each individual x, or obtain a better resolution for the x values such that they aren't the same...

To evaluate the pdf at abscissas, you would pass abcissas as the first argument to pdf. To specify the parameters, use the * operator to unpack the param tuple and pass those values to distr.pdf: pdf = distr.pdf(abscissas, *param) For example, import numpy as np import scipy.stats as stats distrNameList...

python,numpy,statistics,scipy,nested-lists

You need to apply it on a numpy.array reflecting the nested lists. from scipy import stats import numpy as np dataset = np.array([[1.5,3.3,2.6,5.8],[1.5,3.2,5.6,1.8],[2.5,3.1,3.6,5.2]]) stats.mstats.zscore(dataset) works fine....

It is because you are using - TemporaryFile , which is a temporary file which is not stored as outfile in your directory. If you want to save it to outfile , you can give that as the name and it will save it to that file. np.save('outfile',x) When saving...

python,numpy,grid,scipy,reshape

Griddata in matplotlib may be more in line with your requirements. You can do the following, from numpy.random import uniform, seed from matplotlib.mlab import griddata import matplotlib.pyplot as plt import numpy as np datatype = 'grid' npts = 900 xs = uniform(-2, 2, npts) ys = uniform(-2, 2, npts) #...

OK, after some fooling around, we focus on another aspect of good optimization/root finding algorithms. In the comments above we went back and forth around which method in scipy.optimize.root() to use. An equally important question for near-bulletproof 'automatic' root finding is zeroing in on good initial guesses. Often times, good...

python,performance,optimization,scipy,sparse-matrix

First of all don't name your array dict, it is confusing as well as hides the built-in type dict. The problem here is that you're doing everything in quadratic time, so convert your arrays dict and id to a dictionary first where each word or id point to its index....

python,python-2.7,matplotlib,scipy

I think the problem here is that spline interpolation of a higher order is not suitable for smoothing your data. Below I plotted spline interpolations of order 0 to 3. What you see is that once you demand continuity of the derivative (order 2 and higher) you run into problems...

python,numpy,matrix,scipy,sparse

If the indices are increasing (as appears to be from your example), you could use itertools.groupby on an enumerate of the list. For each group, use numpy's indexing. The loop could look like this: import itertools import operator for g, inds in itertools.groupby(enumerate(A), key=operator.itemgetter(1)): ... and the ... should be...

python,pandas,scipy,scikit-learn

You need to define an unambiguous mapping from the row/column names to some indices (it is not important whether "Apple" is "0", or "1", just that it is represented by a number, hence this won't exactly match your result, but it should not matter). In this example, 'info.txt' contains Apple,Google,1...

Try a different method. For me, method="lm" (I'm guessing Levenberg-Marquardt, but I'm not entirely sure) works very well: import numpy as np import scipy.optimize def f(y): w,p1,p2,p3,p4,p5,p6 = y[:7] t1 = w - 0.99006633*(p1**0.5) - (-1.010067)*((1-p1)) t2 = w - 22.7235687*(p2**0.5) - (-1.010067)*((1-p2)) t3 = w - 9.71323491*(p3**0.5) - (-1.010067)*((1-p3))...

python,statistics,scipy,p-value

Re what happens here internally. Well, the Student t distribution is defined for dof > 0, at least in scipy.stats: http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.stats.t.html. Hence a nan: In [11]: stats.t.sf(-11, df=10) Out[11]: 0.99999967038443183 In [12]: stats.t.sf(-11, df=-10) Out[12]: nan ...

Setting bias=False seems to do it: In [3]: scipy.stats.kurtosis(e,bias=False) Out[3]: array([-1.06005831]) ...

python,scipy,interpolation,delaunay

After some experimenting, the solution looks simple (this post was quite helpful): # dimension of the problem (in this example I use 3D grid, # but the method works for any dimension n>=2) n = 3 # my array of grid points (array of n-dimensional coordinates) points = [[1,2,3], [2,3,4],...

You can groupy the 'ITEM' and 'CATEGORY' columns and then call apply on the df groupby object and pass the function mode. We can then call reset_index and pass param drop=True so that the multi-index is not added back as a column as you already have those columns: In [161]:...

You could use np.einsum to do the operation since it allows for very careful control over which axes are multiplied and which are summed: >>> np.einsum('ijk,ij->ik', ind, dist) array([[ 0.4, 0.4, 0.4, 0.4], [ 3. , 3. , 3. , 3. ], [ 1. , 1. , 1. , 1....

This should solve this issue: def resol(t, y, par): p1111, p1212, p2121, p1112, p1121, p1122, p1221, p1222, p2122 = y gamma, h, H = par You will get more problems such as that delta needs to be called delta_t (needs renaming) and ComplexWarning: Casting complex values to real discards the...

java,optimization,machine-learning,scipy,stanford-nlp

What you have should be just fine. (Have you actually had any problems with it?) Setting termination both on max iterations and max function evaluations is probably overkill, so you might omit the last argument to qn.minimize(), but it seems from the documentation that scipy does use both with a...

multidimensional-array,scipy,vectorization,linear-algebra,least-squares

You can gain some speed by making use of the stack of matrices feature of numpy.linalg routines. This doesn't yet work for numpy.linalg.lstsq, but numpy.linalg.svd does, so you can implement lstsq yourself: import numpy as np def stacked_lstsq(L, b, rcond=1e-10): """ Solve L x = b, via SVD least squares...

scipy,convex-optimization,convex,quadratic-programming

This is a problem in Quadratic Programming. Python - CVXOPT Matlab - quadprog ...

python,random,scipy,sparse-matrix

Looks like the feature that you want was added about two months ago and will be available in scipy 0.16: https://github.com/scipy/scipy/blob/77af8f44bef43a67cb14c247bc230282022ed0c2/scipy/sparse/construct.py#L671 You will be able to call sparse.random(10, 10, 0.1, random_state=RandomState(1), data_fvs=func) where func "should take a single argument specifying the length of the ndarray that it will return. The...

python,optimization,machine-learning,scipy,theano

Each call to train_fn is not necessarily a single training epoch. I'm not exactly sure how fmin_cg is implemented, but in general, conjugate gradient methods may call the cost or gradient function more than once per minimziation step. This is (as far as I understand it) required sometimes to find...

python,performance,numpy,scipy,sparse

Perhaps pandas is what you're looking for: d1 = pandas.DataFrame(numpy.array([1, 4]), index=['a', 'b'], dtype="int32") d2 = pandas.DataFrame(numpy.array([2, 2]), index=['a', 'c'], dtype="int32") d1.add(d2, fill_value=0) result: 0 a 3 b 4 c 2 ...

On Windows, you'll need to use the unofficial precompiled binaries: http://www.lfd.uci.edu/~gohlke/pythonlibs/...

Here is an example to use KMeans. from sklearn.datasets import make_blobs from itertools import product import numpy as np import pandas as pd from sklearn.cluster import KMeans # try to simulate your data # ===================================================== X, y = make_blobs(n_samples=1000, n_features=10, centers=3) columns = ['feature' + str(x) for x in np.arange(1,...

python,numpy,statistics,scipy,missing-data

You can not mathematically perform a test statistic based on nan. Unless you find proof/documentation of special treatment of nan, you can not rely on that. My experience is that in general, even numpy does not treat nan specially, for example for median. Instead the results are whatever they happen...

python,statistics,scipy,probability

(1) "Is it from distribution X" is generally a question which can be answered a priori, if at all; a statistical test for it will only tell you "I have a large sample / not a large sample", which may be true but not too useful. If you are trying...

I know you asked for some numpy/scipy routine. So this might not be fast enough for your purposes: sh = M2.shape dic = {M1[2*m+r, 2*n+c]: M2[m, n] for r in xrange(2) for c in xrange(2) for m in xrange(sh[0]) for n in xrange(sh[1])} print dic ### {0: 100, 1: 100,...

It looks like you're building a matrix of pandas series, and passing it to the function. The function wants a matrix of scalars; you can call it multiple times. These two things are not quite the same. There are (at least) two ways to go here. Using apply You could...

The last few lines of the traceback indicate the likely problem: the data file is read as a flat (1D) array, and then scipy tries to reshape the array to an (n, 3) array, which fails. That means the size of the flat array is not a multiple of three...

python,scipy,signal-processing

Looking at your graphs shows that the signal filtered with filtfilt has a peak magnitude of 4.43x107 in the frequency domain compared with 4.56x107 for the signal filtered with lfilter. In other words, the signal filtered with filtfilt has an peak magnitude that is 0.97 that when filtering with Now...

Well, the reason is that the parameters to the two functions are, as you have noted, different. Yes, this makes it really hard to just switch out one for the other, as I well know. Why? In general it was a clear design decision to break backward compatibility with the...

python,python-2.7,numpy,scipy,sparse

In the CSR format, the underlying data, indices, and indptr arrays for your desired y are identical to those of your x matrix. You can pass those to the csr_matrix constructor with a new shape: y = csr_matrix((x.data, x.indices, x.indptr), shape=(2, 3)) Note that the constructor defaults to copy=False, so...

python,parallel-processing,scipy,race-condition

This appears to be a known bug in scipy: See this and this discussion on github. Some workarounds are suggested in these discussions: 1) Execute a single run of the script -- so that the cache file is filled -- and then execute the other runs in parallel. The parallel...

python,scipy,triangulation,delaunay

What you are looking for is called constrained Delaunay triangulation, and unfortunately the scipy.spatial implementation does not support it. As you pointed out, triangle does have that feature -- why not use it instead?...

According to the docs, savemat is defined as io.savemat(file_name, mdict, appendmat=True, format='5', long_field_names=False, do_compression=False, oned_as='row') So the 2nd argument is required, and may be provided with or without the mdict=... part. The reason why it expects this to be a dictionary is that it needs to know the name(s) under...

Just for reference, your CCt = np.einsum('ij...,i...->ij...',C,C) is the same as CCt1=C[:,None,:]*C[:,:,None] producing a (L,K,K) array. For my smaller test case np.einsum is 2x faster. sparse.block_diag converts each submatrix to coo, and passes them to sparse.bmat. bmat collects the rows, cols, data of all the sub matrices into a big...

python,math,numpy,matplotlib,scipy

In this case, you might be better of using Sympy, which allows you to obtain the closed form solutions: from IPython.display import display import sympy as sy from sympy.solvers.ode import dsolve import matplotlib.pyplot as plt import numpy as np sy.init_printing() # LaTeX like pretty printing for IPython t = sy.symbols("t",...

toarray returns an ndarray; todense returns a matrix. If you want a matrix, use todense; otherwise, use toarray.

Your "trick" to import fftconvolve inside the class definition trips you up in the end. Unless you haven't shown us where you define Smooth.fftconvolve elsewhere. Here's what you have now: class Smooth(object): from scipy.signal import fftconvolve def enlarge_smooth(self): # other stuff self.fftconvolve(vals, gs) And when you call s = S()...

This is how I would solve this using a lambda expression, but you can see that the scipy function is faster. from scipy.spatial.distance import squareform, pdist timeit df.apply(lambda x: sum(((x - df) ** 2).sum(axis=1) ** 0.5 <= 3) - 1, axis=1) 100 loops, best of 3: 5.34 ms per loop...

numpy,machine-learning,scipy,hierarchical-clustering

You're on the right track with converting the data into a table like the one on the linked page (a redundant distance matrix). According to the documentation, you should be able to pass that directly into scipy.cluster.hierarchy.linkage or a related function, such as scipy.cluster.hierarchy.single or scipy.cluster.hierarchy.complete. The related functions explicitly...

python,arrays,numpy,scipy,correlation

Correlation (default 'valid' case) between two 2D arrays: You can simply use matrix-multiplication np.dot like so - out = np.dot(arr_one,arr_two.T) Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position. Row-wise Correlation Coefficient calculation...

python,numpy,scipy,curve-fitting,data-fitting

You could simply overwrite your function for the second data set: def power_law2(x, c): return a_one * (x + c) ** b_one x_data_two = np.random.rand(10) y_data_two = np.random.rand(10) c_two = curve_fit(power_law2, x_data_two, y_data_two)[0][0] Or you could use this (it finds optimal a,b for all data and optimal c1 for data_one...

I'm not familiar with image processing, but by adding a simple print statement to your code, you will see that values in your array are 'nan'. for x in range(steps): for y in range(steps): R = mean(im[x*dx:(x+1)*dx,y*dy:(y+1)*dy,0]) G = mean(im[x*dx:(x+1)*dx,y*dy:(y+1)*dy,1]) B = mean(im[x*dx:(x+1)*dx,y*dy:(y+1)*dy,2]) features.append([R,G,B]) features = array(features,'f') #make into array...

Use UnivariateSpline instead of interp1d, and use the derivative method to generate the first derivative. The example at the manual page here is pretty self-explanatory.

python,numpy,plot,scipy,smooth

One solution would be to use the scipy.iterp1D function with a 'cubic' type : from scipy import interpolate .... s_range = numpy.arange(0,21,1) for graph in graphs: cost = [] for s in s_range: cost.append(calculate_cost(s, graph.h, graph.d, graph.r, graph.k, graph.alphaR)) f = interpolate.interp1d(s_range, cost, kind='cubic') s_range_new = np.arange(0,20, 0.1) cost_new =...

I solved with sympy: from sympy import * x, y = symbols("x y") f = (x ** 2 + y ** 2) res = integrate(f, (y, 20, x-2), (x, 22, 30)) Basically sympy.integrate is able to deal with multiple integrations, also with variable boundaries....