python,performance,numpy,scipy,sparse

Perhaps pandas is what you're looking for: d1 = pandas.DataFrame(numpy.array([1, 4]), index=['a', 'b'], dtype="int32") d2 = pandas.DataFrame(numpy.array([2, 2]), index=['a', 'c'], dtype="int32") d1.add(d2, fill_value=0) result: 0 a 3 b 4 c 2 ...

np.einsum would do it: np.einsum('ij,ij->i', a_vec, b_vec) ...

python,image-processing,numpy,scipy

You could simply try a median filter with a small kernel size, from scipy.ndimage import median_filter filtered_array = median_filter(random_array, size=3) which will remove the specks without noticeably changing the original image. A median filter is well suited for such tasks since it will better preserve features in your original image...

Well, the reason is that the parameters to the two functions are, as you have noted, different. Yes, this makes it really hard to just switch out one for the other, as I well know. Why? In general it was a clear design decision to break backward compatibility with the...

you can try this corr_val=0.01 df2 = df1.corr().unstack().reset_index() df2[df2[0]>corr_val] ...

python,numpy,scipy,nonlinear-optimization

As @Jblasco suggested, you can minimize the sum of squares. scipy.leastsq() is designed for such problems. For your example, the code would be: import scipy.optimize as sopt xx0 = np.array([0., 0., 0.]) # starting point rslt = sopt.leastsq(fun, xx0, full_output=True) print("The solution is {}".format(rslt[0])) Look at the other entries of...

On Windows, you'll need to use the unofficial precompiled binaries: http://www.lfd.uci.edu/~gohlke/pythonlibs/...

Mabye: import numpy as np from matplotlib import pyplot as plt from scipy import ndimage from scipy import optimize %matplotlib inline # just a gaussian (copy paste from lmfit, another great package) def my_gaussian(p,x): amp = p[0] cen = p[1] wid = p[2] return amp * np.exp(-(x-cen)**2 /wid) # I...

Yes. >>> len(set(numpy.roots([1, 6, 9]))) 2 >>> numpy.roots([1, 6, 9]) array([-3. +3.72529030e-08j, -3. -3.72529030e-08j]) ...

This is how I would solve this using a lambda expression, but you can see that the scipy function is faster. from scipy.spatial.distance import squareform, pdist timeit df.apply(lambda x: sum(((x - df) ** 2).sum(axis=1) ** 0.5 <= 3) - 1, axis=1) 100 loops, best of 3: 5.34 ms per loop...

python,numpy,plot,scipy,smooth

One solution would be to use the scipy.iterp1D function with a 'cubic' type : from scipy import interpolate .... s_range = numpy.arange(0,21,1) for graph in graphs: cost = [] for s in s_range: cost.append(calculate_cost(s, graph.h, graph.d, graph.r, graph.k, graph.alphaR)) f = interpolate.interp1d(s_range, cost, kind='cubic') s_range_new = np.arange(0,20, 0.1) cost_new =...

It is because you are using - TemporaryFile , which is a temporary file which is not stored as outfile in your directory. If you want to save it to outfile , you can give that as the name and it will save it to that file. np.save('outfile',x) When saving...

Here are two possible ways: The first version uses fmin_cobyla and therefore does not require the derivative of f. from scipy.optimize import fmin_cobyla f = lambda x : - (x[0]**0.5 * x[1]**(0.5)) # x + y = 10 <=> (x + y - 10 >= 0) & (-x -y +...

In the mstats module of scipy.stats, "missing values" are handled using a masked array. nan does not indicate a missing value. The following shows how you can convert your array y (which uses nan for missing values) into a masked array my: In [48]: x = np.arange(12) In [49]: y...

OK, after some fooling around, we focus on another aspect of good optimization/root finding algorithms. In the comments above we went back and forth around which method in scipy.optimize.root() to use. An equally important question for near-bulletproof 'automatic' root finding is zeroing in on good initial guesses. Often times, good...

python-3.x,scipy,pycharm,anaconda

Explanations how to configure PyCharm with Anaconda can be found in the documentation. In PyCharm preferences you can just select the correct python interpreter under, Project Interpreter > Python Interpreters As pointed out by @Cecilia, in the case when a virtual environment (e.g. named py3k) is used with Anaconda, the...

python,scipy,edge-detection,sobel

This is a difficult image to apply simple edge detection due to the stone and concrete textures. The texture makes it almost as though you have a very noisy image to which you are applying first derivative. You'll end up with many small undesired edges. Here is your code working...

toarray returns an ndarray; todense returns a matrix. If you want a matrix, use todense; otherwise, use toarray.

To evaluate the pdf at abscissas, you would pass abcissas as the first argument to pdf. To specify the parameters, use the * operator to unpack the param tuple and pass those values to distr.pdf: pdf = distr.pdf(abscissas, *param) For example, import numpy as np import scipy.stats as stats distrNameList...

From a programming point of view, the problem appears to be your value of gamma and therefore the size of your collapse operators. Print out gamma - it is of the order 10**25 - this seems to be what is preventing the solver from converging. Just for testing (I'm an...

In my computer your computations take only 0.12 seconds: In [59]: %prun quad(lambda x: x*fn(x), -100,100) 26740 function calls in 0.114 seconds And I get the same result: In [59]: quad(lambda x: x*fn(x), -100,100) Out[59]: (0.9999999999999999, 5.793858187044747e-12) Most of it is due to the fact that fn is a bounded...

python,scipy,triangulation,delaunay

What you are looking for is called constrained Delaunay triangulation, and unfortunately the scipy.spatial implementation does not support it. As you pointed out, triangle does have that feature -- why not use it instead?...

python,random,scipy,sparse-matrix

Looks like the feature that you want was added about two months ago and will be available in scipy 0.16: https://github.com/scipy/scipy/blob/77af8f44bef43a67cb14c247bc230282022ed0c2/scipy/sparse/construct.py#L671 You will be able to call sparse.random(10, 10, 0.1, random_state=RandomState(1), data_fvs=func) where func "should take a single argument specifying the length of the ndarray that it will return. The...

Have you looked at the underlying data representation for this, and other sparse formats? For example, for small matrix In [257]: M = sparse.rand(10,10,.1,format='csr') In [258]: M Out[258]: <10x10 sparse matrix of type '<class 'numpy.float64'>' with 10 stored elements in Compressed Sparse Row format> In [259]: M.data Out[259]: array([ 0.86390256,...

fsolve is a wrapper of MINPACK's hybrd, which requires the function's argument and output have the same number of elements. You can try other algorithms from the more general scipy.optimize.root that do not have this restriction (e.g. lm): from scipy.optimize import fsolve, root def fsolve_function(arguments): x = arguments[0] y =...

I know you asked for some numpy/scipy routine. So this might not be fast enough for your purposes: sh = M2.shape dic = {M1[2*m+r, 2*n+c]: M2[m, n] for r in xrange(2) for c in xrange(2) for m in xrange(sh[0]) for n in xrange(sh[1])} print dic ### {0: 100, 1: 100,...

Can we assume they are the same shape? In [202]: a=sparse.csr_matrix([[0,1],[1,0]]) In [203]: b=sparse.csr_matrix([[0,1],[1,1]]) In [204]: (a!=b).nnz==0 Out[204]: False This checks the sparsity of the inequality array. It will give you an efficiency warning if you try a==b (at least the 1st time you use it). That's because all those...

Try a different method. For me, method="lm" (I'm guessing Levenberg-Marquardt, but I'm not entirely sure) works very well: import numpy as np import scipy.optimize def f(y): w,p1,p2,p3,p4,p5,p6 = y[:7] t1 = w - 0.99006633*(p1**0.5) - (-1.010067)*((1-p1)) t2 = w - 22.7235687*(p2**0.5) - (-1.010067)*((1-p2)) t3 = w - 9.71323491*(p3**0.5) - (-1.010067)*((1-p3))...

Without testing myself (on a tablet right now without a Python shell handy), I believe this is due to somewhat strange behavior related to the starting point for the initialization used by the approximate eigensolver library ARPACK, which is what svds ends up calling. If you follow through the Python...

python,python-3.x,scipy,integration,quad

The reason here is that your function is only very strongly peaked in a very small region of your integration region and is effectively zero everywhere else, quad never finds this peak and thus only see's the integrand being zero. Since in this case you know where the peaks are,...

I assume you want to have a 5 by 5 matrix at the end. also indices start from 0. In [18]:import scipy.sparse as sp In [20]: a = [[0,1,2],[0],[0,3,4]] In [31]: m = sp.lil_matrix((5,5), dtype=int) In [32]: for row_index, col_indices in enumerate(a): m[row_index, col_indices] = 1 ....: In [33]: m.toarray()...

python,python-2.7,matplotlib,scipy

I think the problem here is that spline interpolation of a higher order is not suitable for smoothing your data. Below I plotted spline interpolations of order 0 to 3. What you see is that once you demand continuity of the derivative (order 2 and higher) you run into problems...

The problem is that your data set is not centered. Qhull (used to do the Delaunay triangulation) does not center the data set for you under the default options, so it runs to rounding errors far away from origin. You can center it yourself before triangulation points -= points.mean(axis=0) tri...

Just for reference, your CCt = np.einsum('ij...,i...->ij...',C,C) is the same as CCt1=C[:,None,:]*C[:,:,None] producing a (L,K,K) array. For my smaller test case np.einsum is 2x faster. sparse.block_diag converts each submatrix to coo, and passes them to sparse.bmat. bmat collects the rows, cols, data of all the sub matrices into a big...

You're using python wrong. Before you create any top-level python module or package, you should make sure there isn't already a module or package by that name. The best solution here is to not use top-level modules, and instead put everything in a single top-level package (i.e. directory with an...

python,scipy,interpolation,delaunay

After some experimenting, the solution looks simple (this post was quite helpful): # dimension of the problem (in this example I use 3D grid, # but the method works for any dimension n>=2) n = 3 # my array of grid points (array of n-dimensional coordinates) points = [[1,2,3], [2,3,4],...

signal.argrelmin is a thin wrapper around signal.argrelextrema with comparator=np.less. np.less(a, b) returns the truth value of a < b element-wise. Notice that np.less requires a to be strictly less than b for it to be True. Your data has the same minimum value at a lot of neighboring locations. At...

The last few lines of the traceback indicate the likely problem: the data file is read as a flat (1D) array, and then scipy tries to reshape the array to an (n, 3) array, which fails. That means the size of the flat array is not a multiple of three...

java,optimization,machine-learning,scipy,stanford-nlp

What you have should be just fine. (Have you actually had any problems with it?) Setting termination both on max iterations and max function evaluations is probably overkill, so you might omit the last argument to qn.minimize(), but it seems from the documentation that scipy does use both with a...

python,arrays,matplotlib,scipy,scikit-learn

As @wflynny says above, you need to give np.meshgrid two 1D arrays. We can use X.shape to create your x and y arrays, like this: X=np.zeros((100,85)) # just to get the right shape here print X.shape # (100, 85) x=np.arange(X.shape[0]) y=np.arange(X.shape[1]) print x.shape # (100,) print y.shape # (85,) xx,yy=np.meshgrid(x,y,indexing='ij')...

Use scipy's argrelextrema: >>> import numpy as np >>> from scipy.signal import argrelextrema >>> data = np.array([ 5, 12.3, 12.3, 7, 2, 6, 9, 10, 5, 9, 17, 2 ]) >>> radius = 2 # number of elements to the left and right to compare to >>> argrelextrema(data, np.less, order=radius)...

python,scipy,signal-processing

Looking at your graphs shows that the signal filtered with filtfilt has a peak magnitude of 4.43x107 in the frequency domain compared with 4.56x107 for the signal filtered with lfilter. In other words, the signal filtered with filtfilt has an peak magnitude that is 0.97 that when filtering with Now...

python-2.7,scipy,curve-fitting,outliers,best-fit-curve

You are most probably speaking about recursive regression (which is quite easy in Matlab). For python, try and use the scipy.optimize.curve_fit. For a simple 3 degree polynomial fit, this would work based on numpy.polyfit and poly1d. import numpy as np import matplotlib.pyplot as plt points = np.array([(1, 1), (2, 4),...

arrays,vector,scipy,vectorization,linear-algebra

You could use the meshgrid function from numpy: import numpy as np M=10 N=10 D=1 ux=0.5 uy=0.5 xo=1 yo=1 A=np.empty((M,N,3)) x=range(M) y=range(N) xv, yv = np.meshgrid(x, y, sparse=False, indexing='ij') A[:,:,0]=D*ux - (xv-xo) A[:,:,1]=D*uy - (yv-yo) A[:,:,2]=D ...

Here is an example to use KMeans. from sklearn.datasets import make_blobs from itertools import product import numpy as np import pandas as pd from sklearn.cluster import KMeans # try to simulate your data # ===================================================== X, y = make_blobs(n_samples=1000, n_features=10, centers=3) columns = ['feature' + str(x) for x in np.arange(1,...

Your "trick" to import fftconvolve inside the class definition trips you up in the end. Unless you haven't shown us where you define Smooth.fftconvolve elsewhere. Here's what you have now: class Smooth(object): from scipy.signal import fftconvolve def enlarge_smooth(self): # other stuff self.fftconvolve(vals, gs) And when you call s = S()...

You could use np.einsum to do the operation since it allows for very careful control over which axes are multiplied and which are summed: >>> np.einsum('ijk,ij->ik', ind, dist) array([[ 0.4, 0.4, 0.4, 0.4], [ 3. , 3. , 3. , 3. ], [ 1. , 1. , 1. , 1....

Here is a way to do it (from networkx.pagerank_scipy). It uses scipy linear algebra functions instead of iterating over each row. That will probably be faster for large graphs. In [42]: G=nx.random_geometric_graph(5,0.5) In [43]: M=nx.to_scipy_sparse_matrix(G, nodelist=G.nodes(), dtype=float) In [44]: M.todense() Out[44]: matrix([[ 0., 1., 0., 1., 1.], [ 1., 0.,...

I believe data would be of type - numpy.ndarray You can do type(data) it should comes out as numpy.ndarray . Also , scipy.random.randint() also returns a value of type numpy.ndarray . You may not be able to do lst[[1,2]] , but you can use numpy.ndarray as a subscript to another...

python,math,numpy,matplotlib,scipy

In this case, you might be better of using Sympy, which allows you to obtain the closed form solutions: from IPython.display import display import sympy as sy from sympy.solvers.ode import dsolve import matplotlib.pyplot as plt import numpy as np sy.init_printing() # LaTeX like pretty printing for IPython t = sy.symbols("t",...

I might miss something but I think the curve_fit just works fine. When I compare the residuals obtained by curve_fit to the ones one would obtain using the parameters obtained by excel which you provide in the comments, the python results always lead to lower residuals (code is provided below)....

This solution really focuses on readability over performance - It explicitly calculates and stores the whole n x n distance matrix and therefore cannot be considered efficient. But: It is very concise and readable. import numpy as np from scipy.spatial.distance import pdist, squareform #create n x d matrix (n=observations, d=dimensions)...

python,numpy,matplotlib,scipy,voronoi

The code in the guide can be used to acheive this. http://docs.scipy.org/doc/scipy/reference/tutorial/spatial.html I haven't yet had time to read through and understand all the code, but it "manually" does what I want and it works. Instead of using voronoi_plot_2d(vor) It step by step uses the different parts of vor to...

python,numpy,matrix,scipy,sparse

If the indices are increasing (as appears to be from your example), you could use itertools.groupby on an enumerate of the list. For each group, use numpy's indexing. The loop could look like this: import itertools import operator for g, inds in itertools.groupby(enumerate(A), key=operator.itemgetter(1)): ... and the ... should be...

The obvious thing to do is remove the NaNs from data. Doing so, however, also requires that the corresponding positions in the 2D X, Y location arrays also be removed: X, Y = np.indices(data.shape) mask = ~np.isnan(data) x = X[mask] y = Y[mask] data = data[mask] Now you can use...

Setting bias=False seems to do it: In [3]: scipy.stats.kurtosis(e,bias=False) Out[3]: array([-1.06005831]) ...

python,numpy,scipy,bioinformatics

From the Wikipedia article on the definition of matrix logarithm: A real matrix has a real logarithm if and only if it is invertible and each Jordan block belonging to a negative eigenvalue occurs an even number of times. We'll need access to your matrix so we can compute the...

python,arrays,numpy,scipy,distance

Distances between labeled regions of an image can be calculated with the following code, import itertools from scipy.spatial.distance import cdist # making sure that IDs are integer example_array = np.asarray(example_array, dtype=np.int) # we assume that IDs start from 1, so we have n-1 unique IDs between 1 and n n...

python,scipy,scikit-learn,atlas

Thanks to Andreas Mueller for the tip: doing a local install of anaconda to a directory I owned got around the compilation issues.

python,statistics,scipy,p-value

The degrees of freedom you are passing to the formula are negative. In [6]: import numpy as np from scipy.special import stdtr dof = -2176568 tf = -11.374250 2*stdtr(dof, -np.abs(tf)) Out[6]: nan If positive: In [7]: import numpy as np from scipy.special import stdtr dof = 2176568 tf...

python,numpy,matplotlib,scipy,least-squares

According to the documentation of the function scipy.linalg.lstsq http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.linalg.lstsq.html the estimated coefficients should be stored in your variable C (the order corresponding to columns in A). To print your equation with estimated coefficients showing 2 digits after decimal point: print 'f(x,y) = {:.2f}x^2+{:.2f}y^2+{:.2f}xy+{:.2f}x+{:.2f}y+{:.2f}'.format(C[4],C[5],C[3],C[1],C[2],C[0]) or: print 'f(x,y) = {4:.2f}x^2+{5:.2f}y^2+{3:.2f}xy+{1:.2f}x+{2:.2f}y+{0:.2f}'.format(*C)...

multidimensional-array,scipy,vectorization,linear-algebra,least-squares

You can gain some speed by making use of the stack of matrices feature of numpy.linalg routines. This doesn't yet work for numpy.linalg.lstsq, but numpy.linalg.svd does, so you can implement lstsq yourself: import numpy as np def stacked_lstsq(L, b, rcond=1e-10): """ Solve L x = b, via SVD least squares...

python,numpy,statistics,scipy,nested-lists

You need to apply it on a numpy.array reflecting the nested lists. from scipy import stats import numpy as np dataset = np.array([[1.5,3.3,2.6,5.8],[1.5,3.2,5.6,1.8],[2.5,3.1,3.6,5.2]]) stats.mstats.zscore(dataset) works fine....

I solved with sympy: from sympy import * x, y = symbols("x y") f = (x ** 2 + y ** 2) res = integrate(f, (y, 20, x-2), (x, 22, 30)) Basically sympy.integrate is able to deal with multiple integrations, also with variable boundaries....

It looks like you're building a matrix of pandas series, and passing it to the function. The function wants a matrix of scalars; you can call it multiple times. These two things are not quite the same. There are (at least) two ways to go here. Using apply You could...

The easiest way to deal with non-cubic integration volumes in scikit-monaco is to re-define your integration function so that it returns 0 outside of the region of integration (see this section of the documentation): def modified_integrand(xs): x, y = xs if norm.ppf(0.001, x, 3) < y < norm.ppf(0.999, x, 3):...

According to the docs, savemat is defined as io.savemat(file_name, mdict, appendmat=True, format='5', long_field_names=False, do_compression=False, oned_as='row') So the 2nd argument is required, and may be provided with or without the mdict=... part. The reason why it expects this to be a dictionary is that it needs to know the name(s) under...

python,statistics,scipy,probability

(1) "Is it from distribution X" is generally a question which can be answered a priori, if at all; a statistical test for it will only tell you "I have a large sample / not a large sample", which may be true but not too useful. If you are trying...

python,arrays,numpy,data-structures,scipy

Well, here are several ways for achieving this: Map Using Python's map built-in function you can do that easily. animal_list = ['cat', 'elephant'] your_list = ['dog', 'horse', 'cat', 'shark', 'cancer', 'elephant'] res = map(lambda item: item in animal_list, your_list) print res Output [False, False, True, False, False, True] List comprehension...

python,scipy,linear-programming

Here is a wrapper that incorporates lower bound rows in the lingprog formulation. Note that more error trapping is necessary (for example, the number of columns of each A matrix need to be equal), this is not meant to be a robust implementation. For proper error trapping, I suggest you...

This should solve this issue: def resol(t, y, par): p1111, p1212, p2121, p1112, p1121, p1122, p1221, p1222, p2122 = y gamma, h, H = par You will get more problems such as that delta needs to be called delta_t (needs renaming) and ComplexWarning: Casting complex values to real discards the...

According to the documentation, numpy.correlate was designed for 1D arrays, while scipy.correlate can accept ND-arrays. The scipy implementation being more general and therefore complex, seem indeed to incur an additional computational overhead. You can compare the C code between numpy and scipy implementations. Another difference, could be for instance, that...

numpy,machine-learning,scipy,hierarchical-clustering

You're on the right track with converting the data into a table like the one on the linked page (a redundant distance matrix). According to the documentation, you should be able to pass that directly into scipy.cluster.hierarchy.linkage or a related function, such as scipy.cluster.hierarchy.single or scipy.cluster.hierarchy.complete. The related functions explicitly...

python,numpy,scipy,curve-fitting,data-fitting

You could simply overwrite your function for the second data set: def power_law2(x, c): return a_one * (x + c) ** b_one x_data_two = np.random.rand(10) y_data_two = np.random.rand(10) c_two = curve_fit(power_law2, x_data_two, y_data_two)[0][0] Or you could use this (it finds optimal a,b for all data and optimal c1 for data_one...

Use UnivariateSpline instead of interp1d, and use the derivative method to generate the first derivative. The example at the manual page here is pretty self-explanatory.

scipy,convex-optimization,convex,quadratic-programming

This is a problem in Quadratic Programming. Python - CVXOPT Matlab - quadprog ...

python,numpy,scipy,sparse-matrix,eigenvalue

tl;dr: You can use the which='LA' flag as described in the documentation. I quote: scipy.sparse.linalg.eigsh(A, k=6, M=None, sigma=None, which='LM', v0=None, ncv=None, maxiter=None, tol=0, return_eigenvectors=True, Minv=None, OPinv=None, mode='normal') Emphasis mine. which : str [‘LM’ | ‘SM’ | ‘LA’ | ‘SA’ | ‘BE’] If A is a complex hermitian matrix, ‘BE’ is...

This syntax worked with RectBivariateSpline x2 = np.linspace(xmin,xmax,c1) y2 = np.linspace(ymin,ymax,r1) f2 = sp.interpolate.RectBivariateSpline(x2, y2, dstnc1.T,kx=1, ky=1) I can then evaluate at new points with this: out2 = f2.ev(xnew1,ynew1) For interp2d I am stuck as I am not able to bypass firewall at my office to update Anaconda (Windows). I...

I'm not familiar with image processing, but by adding a simple print statement to your code, you will see that values in your array are 'nan'. for x in range(steps): for y in range(steps): R = mean(im[x*dx:(x+1)*dx,y*dy:(y+1)*dy,0]) G = mean(im[x*dx:(x+1)*dx,y*dy:(y+1)*dy,1]) B = mean(im[x*dx:(x+1)*dx,y*dy:(y+1)*dy,2]) features.append([R,G,B]) features = array(features,'f') #make into array...

python,arrays,scipy,vectorization,linear-algebra

You should just divide your array by the sqrt of the sum of squares of your array's last dimension. In [1]: import numpy as np In [2]: x = np.random.rand(1000, 500, 3) In [3]: normed = x / np.sqrt((x**2).sum(axis=-1))[:,:,None] #None could be np.newaxis Note that if you want to compute...

python,scipy,curve-fitting,data-fitting

This is very well within reach of scipy.optimize.curve_fit (or just scipy.optimize.leastsqr). The fact that a sum is involved does not matter at all, nor that you have arrays of parameters. The only thing to note is that curve_fit wants to give your fit function the parameters as individual arguments, while...

optimize.linprog always minimizes your target function. If you want to maximize instead, you can use that max(f(x)) == -min(-f(x)) from scipy import optimize optimize.linprog( c = [-1, -2], A_ub=[[1, 1]], b_ub=[6], bounds=(1, 5), method='simplex' ) This will give you your expected result, with the value -f(x) = -11.0 slack: array([...

python,numpy,scipy,distribution

The loc parameter always shifts the x variable. In other words, it generalizes the distribution to allow shifting x=0 to x=loc. So that when loc is nonzero, maxwell.pdf(x) = sqrt(2/pi)x**2 * exp(-x**2/2), for x > 0 becomes maxwell.pdf(x, loc) = sqrt(2/pi)(x-loc)**2 * exp(-(x-loc)**2/2), for x > loc. The doc string...

Your data contains several y values for the same x value. This violates the assumptions of most interpolation algorithms. Either discard the rows with duplicate x values, average the y values for each individual x, or obtain a better resolution for the x values such that they aren't the same...

python,statistics,scipy,p-value

Re what happens here internally. Well, the Student t distribution is defined for dof > 0, at least in scipy.stats: http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.stats.t.html. Hence a nan: In [11]: stats.t.sf(-11, df=10) Out[11]: 0.99999967038443183 In [12]: stats.t.sf(-11, df=-10) Out[12]: nan ...

python,machine-learning,scipy,scikit-learn,pipeline

You can write your own transformer that'll transform input into predictions. Something like this: class PredictionTransformer(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin): def __init__(self, estimator): self.estimator = estimator def fit(self, X, y): self.estimator.fit(X, y) return self def transform(self, X): return self.estimator.predict_proba(X) Then you can use FeatureUnion to glue your transformers together. That said, there's a...

python,pandas,scipy,scikit-learn

You need to define an unambiguous mapping from the row/column names to some indices (it is not important whether "Apple" is "0", or "1", just that it is represented by a number, hence this won't exactly match your result, but it should not matter). In this example, 'info.txt' contains Apple,Google,1...

python,python-2.7,numpy,scipy,sparse

In the CSR format, the underlying data, indices, and indptr arrays for your desired y are identical to those of your x matrix. You can pass those to the csr_matrix constructor with a new shape: y = csr_matrix((x.data, x.indices, x.indptr), shape=(2, 3)) Note that the constructor defaults to copy=False, so...

np.fromfile takes a "count" argument, which specifies how many items to read. If you know the number of integers in the header in advance, a simple way to do what you want without any type conversions would just be to read the header as integers, followed by the rest of...