java,classification,distribution,decision-tree

Looking at the javadocs I found this https://docs.oracle.com/javase/8/docs/api/java/util/Random.html#nextGaussian-- turns out the random number generator actually can implement many distributions. I'm not sure why they call it Random.nextGaussian() as it has mean=0 and std-1, it even says in the docs its actually a normal distribution. edit since you have a mean...

The lower and upper bounds specify the range over which the probability is uniform. For example, imagine you go to a bus stop where the bus arrives once every five minutes. If you walk to the bus stop at random times, your wait at the stop will have a lower...

It shouldn't matter much, although being a multiple of N (or close to one) will help around the edges, especially for smaller M. The more important thing is that your random numbers are evenly distributed between 0 and M.

excel,distribution,gaussian,normal-distribution

The results from NORM.DIST are correct... if you directly implement the Gaussian function in your sheet using: =(1/($F$8*SQRT(2*PI())))EXP( -((M3-$F$7)^2)/(2$F$8^2)) which is an implementation of the standard Gaussian function e.g. f(x) on: http://mathworld.wolfram.com/GaussianFunction.html then the results exactly match Excel's NORM.DIST built in function. When you say the values "should be" in...

algorithm,geometry,distribution,computational-geometry,mathematical-optimization

This is a linear program that can be solved by general LP solvers. It can also be modelled more specifically as a min-cost max-flow problem: Put the attractors on the left side of a bipartite graph and the points on the right side. Connect every attractor i on the left...

r,distribution,modeling,mixture-model,weibull

This error message is entirely as expected. When you run this program it provides a followup warning message: "numerical integration failed, ignore previous messages, optimisation will try again" which implies you can simply ignore this error from the "integrate" function. Why is this normal? Well that explanation needs some background...

plot,distribution,density-plot

hist is a nice way to see a distribution. See here for details.

Start is the initial guess for the parameters of your distribution. There are logs involved because it is using maximum likelihood and hence log-likelihoods. library(fitdistrplus) dat <- rt(100, df=10) fit <- fitdist(dat, "t", start=list(df=2)) ...

matlab,random,distribution,gaussian,sampling

Might use Irwin-Hall from https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution Basically, min(IH(n)) = 0 max(IH(n)) = n peak(IH(n)) = n/2 Scaling to your [1.9...2.1] range v = 1.9 + ((2.1-1.9)/n) * IH(n) It is bounded, very easy to sample, and at large n it is pretty much gaussian. You could vary n to get narrow...

You first need to "bucket up" the range of interest, and of course you can do it with tools from scipy &c, but for the sake of understanding what's going on a little Python version might help - with no optimizations, for ease of understanding: import collections def buckets(discrete_set, amin=None,...

r,statistics,distribution,normal-distribution

Here is a simple implementation. Like @DanielJohnson says you can just use the cdf form univariate normal, but it should be same as using the pmvnorm, shown below. The version using pnorm is much faster. ## Choose the matrix dimensions yticks <- xticks <- seq(-3, 3, length=100) side <- diff(yticks[1:2])...

r,statistics,distribution,random-sample

x <- lapply(c(1:20000), function(x){ lapply(c(1:2), function(y) rnorm(50,2.5,3)) }) This produces 20000 paired samples, where each sample is composed of 50 observations from a N(2.5,3^2) distribution. Note that x is a list where each slot is a list of two vector of length 50. To t-test the samples, you'll need to...

In fact, it looks as though dlnorm3 (which is built into the FAdist package) already returns a zero probability when x<=thres, so plugging dlnorm3 straight into fitdistr appears to work fine: set.seed(12345) library(FAdist) library(MASS) X <- rlnorm3(n=100, shape = 2, scale = 1.5, thres = 1) fitdistr(X,dlnorm3,start=list(shape = 2, scale...

Here is a way using the distr package, which is designed for this. library(distr) p <- function(x) (2/pi) * (1/(exp(x)+exp(-x))) # probability density function dist <-AbscontDistribution(d=p) # signature for a dist with pdf ~ p rdist <- r(dist) # function to create random variates from p set.seed(1) # for reproduceable...

1) Use a library implementation, as suggested by Dima. Or, if you really feel a burning need to do this yourself: 2) Assuming you want to generate normals with a mean vector M and variance/covariance matrix V, perform Cholesky Decomposition on V to come up with lower triangular matrix L...

Won't you always know the number of trials (i.e., the size parameter)? If so, then try fitBinom=fitdist(data=scorebinom, dist="binom", fix.arg=list(size=8), start=list(prob=0.3)) to estimate p and its error....

http://en.cppreference.com/w/cpp/numeric/random/discrete_distribution/discrete_distribution You are initializing it with iterators so you need to use this constructor template< class InputIt > discrete_distribution( InputIt first, InputIt last ); So I'm guessing it should be std::discrete_distribution<int&> dist(weights.begin(), weights.end()); But I haven't used discrete_distribution before....

First, cov is the name of the covariance function, so you better call your variable e.g. sigma. Second, you create the cov variable to be a 4-D array with value 0 at cov(1,1,1,1) and 0.5 at cov(1,1,1,2). Depending on how the covariance matrices looks, the variable sigma can look different....

matlab,statistics,integration,distribution,symbolic-math

Similar to an answer several months ago, the Statistics Toolbox doesn't support the Symbolic Toolbox currently. Therefore, you can proceed by hard coding the PDF itself and integrating it: d = exp(-(log(x)-mu)^2/(2*sigma^2))/(x*sigma*sqrt(2*pi)); int(d, x, 0, 10); Or you can use the logncdf function, which may be cleaner....

This error happens due to ODS setting. proc transreg can generate both plot and data. Because I want to save the data generated by proc transreg, I must turn the ODS graphics off. Using one line ods graphics off; before the macro (or changing the setting of ODS registry). The...

function,macros,sas,distribution,random-sample

What you need is the inverse cumulative distribution function. This is the function that is the inverse of the normalized integral of the distribution over the entire domain. So at 0% is your most negative possible value and 100% is your most positive. Practically though you would calmp to something...

matlab,statistics,distribution

If I understand correctly, you are asking how to decide which distribution to choose once you have a few fits. There are three major metrics (IMO) for measuring "goodness-of-fit": Chi-Squared Kolmogrov-Smirnov Anderson-Darling Which to choose depends on a large number of factors; you can randomly pick one or read the...

SQL Fiddle select quantity, count(*) from ( select user_name, count(*) as quantity from t where product_name = 'candle' group by user_name ) s group by quantity order by quantity ...

A ± 2 window means 2 words to the left and 2 words to the right of the target word. For target word "silence", the window would be ["gavel", "to", "the", "court"], and for "hammer", it would be ["when", "the", "struck", "it"].

c#,excel,statistics,distribution,mathnet

I was very close. My Math.NET example was equivalent to (1 - TDIST(0.84, 8009, 2)), so I merely needed to subtract that from 1: double result = 1 - (2 * (1 - StudentT.CDF(0, 1, 8009, 0.84))) ...

r,matlab,statistics,distribution,probability

It appears that R's qt may use a completely different algorithm than Matlab's tinv. I think that you and others should report this deficiency to The MathWorks by filing a service request. By the way, in R2014b and R2015a, -Inf is returned instead of NaN for small values (about eps/8...

charts,flot,distribution,graphic,hour

Just convert your timestamps to hours, for example like this: $.each(d, function (index, datapoint) { datapoint[0] = (new Date(datapoint[0])).getHours(); }); (If you want values with AM / PM change accordingly.) And of course remove the mode: "time" from your options. See this fiddle for the changed code. ...

simulation,distribution,exponential-distribution

It's not an error to get zero. With exponential interarrival times and a rate of 3 per hour, the number of occurrences in an hour has a Poisson distribution with λ=3. The probability of getting n outcomes is e-λλn/n! which for n=0 is just under 0.05. In other words, you...

There are lots of things wrong here. for(i in 1:simsize=simsize) should be throwing an error: > for(i in 1:simsize=simsize) { print(i)} Error: unexpected '=' in "for(i in 1:simsize=" Better is for(i in seq_len(simsize)) Then x <- function(ran.func) is not doing what you thought it was; it is returning a function...

python,list,distribution,percentage

This can trivially be achieved by setting a slice with a step: def select_elements(seq, perc): """Select a defined percentage of the elements of seq.""" return seq[::int(100.0/perc)] In use: >>> select_elements(range(10), 50) [0, 2, 4, 6, 8] >>> select_elements(range(10), 33) [0, 3, 6, 9] >>> select_elements(range(10), 25) [0, 4, 8] You...

java,random,distribution,nan,gaussian

In your code dRandom1 can be negative, while real logarithms only take arguments from (0, +inf)

This produces the same result: import numpy as np my_array = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2]) ((my_array[:-1] * my_array[1:]) < 0).sum() gives: 8 and seems to be the fastest solution: %timeit ((my_array[:-1] * my_array[1:]) < 0).sum() 100000 loops, best of...

matlab,loops,for-loop,distribution,quantile

Your test variable is a three-dimensional variable, so when you do test2 = test(:,1); and then test2(:) <VaR_Calib_EVT/100 it's not the same as in your second example when you do test(:,i)<(VaR_Calib_EVT(:,i)/100) To replicate the results of your first example you could explicitly do the test2 assignment inside the loop, which...

r,distribution,normal-distribution,binning

If your range of data is from -2:2 with 15 intervals and the sample size is 77 I would suggest the following to get the expected heights of the 15 intervals: rn <- dnorm(seq(-2,2, length = 15))/sum(dnorm(seq(-2,2, length = 15)))*77 [1] 1.226486 2.084993 3.266586 4.716619 6.276462 7.697443 8.700123 9.062576 8.700123...

I used wolfram alpha to find the IDCF of the GEV for ξ!=0: here and ξ==0: here Here is an implementation: #include <iostream> #include <random> #include <cmath> double icdf(double x, double mu, double sigma, double xi) { if(xi == 0) { return (mu - sigma * log(-log(x))); } else {...

Well, I've been playing around with this and have come up with a solution that will work for my purposes. I essentially mix the larger items into the smaller items and loop back through until I run out of larger items. For Each item In smallerList mergedList.add(smallerID) Next itemsRemaining =...

fitdist is expecting a density/distribution function with named arguments. library("lmomco") library("fitdistrplus") ## reproducible: month <- c(27.6, 97.9, 100.6, 107.3, 108.5, 109, 112.4, 120.9, 137.8) Setup: lmom <- lmoms(month,nmom=5) #from lmomco package para <- pargev(lmom, checklmom=TRUE) dgev <- pdfgev #functions which are included in lmomco pgev <- cdfgev fitgev <- fitdist(month,...

combinations,permutation,distribution

Based on your response in the comments above, here's a recursive solution in Ruby: $resolution = 100 $res_f = $resolution.to_f def allocate(remainder, bin_number, all_bin_values, number_of_bins) if bin_number >= number_of_bins all_bin_values << remainder / $res_f puts all_bin_values.join(", ") all_bin_values.pop else remainder.downto(1) do |i| if remainder - i >= number_of_bins - bin_number...

A simple, if rather naive, scheme is to sum the absolute differences between your observations and a perfectly uniform distribution red = abs(4 - 7/4) = 9/4 blue = abs(0 - 7/4) = 7/4 orange = abs(2 - 7/4) = 1/4 purple = abs(1 - 7/4) = 3/4 for a...

from itertools import izip_longest as izip_l, chain config = {'A': 3, 'B': 4, 'C': 2} # Reconstruct the list of lists expanded = [[k] * config[k] for k in config] # Just zip them and ignore the None print[item for item in chain.from_iterable(izip_l(*expanded)) if item] If the count is too...

Use rejection sampling on the graph of the Lévy PDF [answer from user: lhf] Example of code: function rejectionSampling() repeat local x = random.uniform(1) local y = random.uniform(1.5) -- PDF maximum peak at x=1/3 --> y~1.45 fx = math.sqrt(1/(2*math.pi))*math.exp(-1/(2*x))/(x^1.5) --PDF until y < fx return x end ...

I constructed something on my own using your mm data. First let's plot the density of mm in order to visualise the modes: plot(density(mm)) So, we can see there are 2 modes in this distribution. One around 600 and one around 1000. Let's see how to find them. In order...

r,distribution,truncation,gamma-distribution

From the rgamma help page: "Invalid arguments will result in return value NaN, with a warning." If this is what you see, you could use ow <- options("warn") options(warn=2) G0 <- try(Gammad(scale = s, shape = sh), silent=TRUE) if(inherits(G0, "try-error")) # handle invalid arguments options(warn=ow) ...

javascript,algorithm,math,statistics,distribution

You mention a logarithmic distribution, but it looks like your code is designed to generate a truncated geometric distribution instead, although it is flawed. There is more than one distribution called a logarithmic distribution and none of them are that common. Please clarify if you really do mean one of...

c++,algorithm,random,distribution

For PDF proportional to some linear function, CDF will be proportional to x squared . Thus, sampling would require sqrt(), something along the lines x = xmin + sqrt(urand())*(xmax - xmin); y = ymin + sqrt(urand())*(ymax - ymin); where urand() is U(0,1) RNG (probably equal to rand()/RAND_MAX, but I've abandoned...

c++,distribution,factorial,poisson

One workaround to deal with large n's is calculating the distribution in the log domain: X = ((e^-lambda)*(lambda^n))/n! ln X = -lambda + n*ln(lambda) - Sum (ln(n)) return e^X ...

matlab,image-processing,histogram,distribution,vision

If you mean "distinguish" by "separate", then yes: The property you describe is called bimodality, i.e. you have 2 peaks that can be seperated by one threshold. So your question is actually "how do I test for an underlying bimodal distribution?" One option to do this programmatically is Binning. This...

javascript,arrays,distribution,content-length

Simple maths. If You have n items and want to split them into k chunks, every chunk will have n / k items. But what if n / k isn't an integer? Then some of the chunks will have n / k rounded down items and some will have n...

The following code should work. I've assumed here that the total number of items in the list shouldn't change. public List<List<Item>> distribute(List<Item> list, int y, int z) { int x = list.size(); int nLists = (int) Math.ceil((double)x/y); // Create result lists List<List<Item>> result = new ArrayList<>(); for (int j =...

Urm, the simplest solution I can think so far is: Dim list As List(Of Object) Dim random As System.Random = New System.Random() Dim Countries() As String = {"China", "China", "China", "China", _ "Japan", "Japan", "Japan", _ "South Korea", _ "North Korea", _ "Taiwan"} For i As Integer = 1 To...

Try the rv package. Note that if X is an exponential random variable with mean 1, then -log(X) has a standard Gumbel distribution.

ios,app-store,special-characters,itunesconnect,distribution

In my opinion contacting Apple will be a good idea, because according to iTunes Connect Developer Guide: Note: The first time you add an app is your only chance to set a company name distinct from your legal entity name . So my suggestion is don't take risk, contact Apple....

You will need to make non-trivial changes to the source of mmedist -- I recommend that you copy out the code, and make your own function foo_mmedist. The first change you need to make is on line 94 of mmedist: if (!exists("memp", mode = "function")) That line checks whether "memp"...

android,gradle,distribution,crashlytics

Ohh, I did miss something.. Don't forget to add your flavor name to: crashlyticsUploadDistribution<FLAVOR>Release ...

r,distribution,normal-distribution,beta-distribution

Use uniroot(). uniroot(function(x) dbeta(x, 1, 2)-dnorm(x, 0, 1), c(0, 1)) ## $root ## [1] 0.862456 ## ## $f.root ## [1] 5.220165e-05 ## ## $iter ## [1] 3 ## ## $estim.prec ## [1] 6.103516e-05 This solves an equation dbeta(x, ...) == dnorm(x, ...) w.r.t. x (in the inverval [0,1], as this...

Rather than roll your own calculation, it would be easier to use the tm package. Convert myTbl to a term document matrix (tdm) library(tm) tdm <- TermDocumentMatrix(myTbl) # there are many more clean up steps, but I am simplifying Then you you have not just Zipf but also Heaps and...

python,optimization,distribution,cx-freeze,pyglet

You can use the same -O flag when you run cx_freeze to generate your final build, meaning that the cx_freeze generated bytecode will already be optimized. From the cxfreeze docs: cxfreeze hello.py --target-dir dist Further customization can be done using the following options: ... -O optimize generated bytecode as per...

javascript,arrays,angularjs,distribution

I might have found the solution myself. Thanks to @pankajparkar I understood that I was not saving values in a proper array. Starting from that I thought that an object would have been much more appropriate. So, gathered the numbers in a more "clean" object like: {"34":0.04,"35":0.1,"36":0.12,"37":0.14,"38":0.16,"39":0.18,"40":0.22,"41":0.28,"43":0.36,"44":0.38,"45":0.42,"46":0.48,"47":0.52,"48":0.6,"49":0.64,"50":0.7,"51":0.8,"52":0.88,"53":0.92,"54":0.94,"55":1} I am getting...

excel,vba,excel-vba,distribution

Excel can handle the T distribution: Sub test() Debug.Print Application.WorksheetFunction.T_Dist(0.95, 20, False) 'pdf value Debug.Print Application.WorksheetFunction.T_Dist(0.95, 20, True) 'cdf value End Sub Output: 0.247866113626846 0.823274027108808 ...

c++,c++11,random,distribution,prng

Interesting question. So I was wondering if interfering with how the distribution works by constantly resetting it (i.e. recreating the distribution at every call of get_int_from_range) I get properly distributed results. I've written code to test this with uniform_int_distribution and poisson_distribution. It's easy enough to extend this to test another...

You haven't said what you are trying to achieve. From looking at that Wikipedia article, if you want to generate different values for x, it seems to me that you need to say something like, if (rand < f(a,b,c)) { x = a + Math.Sqrt(rand * (b-a) * (c-a)); }...

algorithm,optimization,distribution

Let z be a real parameter. My understanding of the problem is that you want to find z such that, when bucket i is allocated max(MINi, min(MAXi, Wi z)), the sum of allocations equals x. Here's an O(n log n)-time algorithm (there's probably a linear-time one, but if it does...

python,r,distribution,kernel-density

I would plot the empirical cumulative distribution function. This makes sense because the comparison of these two functions is also the basis for the Kolmogorov–Smirnov test for the significance of the difference of the two distributions. There are at least two options to plot these functions in R: plot(ecdf(data$X.ofTotal),col="green",xlim=c(0,1),verticals =...

There is no such traits in standard library. You can just write something like template<typename T> struct is_distribution : public std::false_type {}; and specialize for each type, that is distribution template<typename T> struct is_distribution<std::uniform_int_distribution<T> > : public std::true_type {}; Then just template<typename Distr> typename std::enable_if<is_distribution<Distr>::value>::type print_random(Distr& d) { cout <<...

c#,.net,statistics,distribution,weibull

You don't necessarily need a 3-parameter Weibull. Your non-failing data is called right-censored and requires survival analysis. A straightforward maximum likelihood approach should work here using just a 2-parameter Weibull model.

sql,sql-server,distribution,probability

SQL-Server does not incorporate a lot of statistical functions. tinv is not present in SQL-Server. The only way to add a tinv function, is to use a CLR-Function. Thus, the problem reduces itselfs to "How do I calculate tinv with the subset of C# allowed in SQL-Server ?". If you're...

random,numbers,distribution,probability

The most straightforward way that I can see is this. Assuming that you have have large number of points {f(X1),--,f(Xn)}, plot them as distribution and fit a generalized Gaussian distribution curve through them. After this, you can use rejection sampling to generate further numbers from the same distribution.

I deleted my other answer because it was simply wrong! Then it occurred to me that there is a much simpler method: public static List<List<Item>> distribute(List<Item> items, int y, int z) { // Create list of items * z List<Item> allItems = new ArrayList<>(); for (int i = 0; i...

simulation,distribution,exponential

What you're doing is called a time-step simulation, and can be terribly inefficient. Each tick in your master clock for loop represents a delta-t increment in time, and in each tick you have a laundry list of "did this happen?" possible updates. The larger the time ticks are, the lower...

r,distribution,gamma-distribution

There are several things going on here. First, you calculate the scales as: scale.list <- runif(1000, max = 100000, min = 100000) but since min = max, all the values are identical. Second, you do not specify lb.arg or ub.arg, so I set them to 20 and 50 arbitrarily. Third,...

To evaluate the pdf at abscissas, you would pass abcissas as the first argument to pdf. To specify the parameters, use the * operator to unpack the param tuple and pass those values to distr.pdf: pdf = distr.pdf(abscissas, *param) For example, import numpy as np import scipy.stats as stats distrNameList...

python,numpy,scipy,distribution

The loc parameter always shifts the x variable. In other words, it generalizes the distribution to allow shifting x=0 to x=loc. So that when loc is nonzero, maxwell.pdf(x) = sqrt(2/pi)x**2 * exp(-x**2/2), for x > 0 becomes maxwell.pdf(x, loc) = sqrt(2/pi)(x-loc)**2 * exp(-(x-loc)**2/2), for x > loc. The doc string...

ios,frameworks,app-store,distribution

Your client need to sign with their own cert as the app is compiler and linked with your framework. Using your framework doesn't required your cert to sign unless you did something wrong when building the framework.

matlab,distribution,sampling,random-sample

So, you can use this, for example: y = 0.8 + rand*0.4; this will generate random number between 0.8 and 1.2. because rand creates uniform distribution I believe that rand*0.4 creates the same ;) ...

distribution,netlogo,gamma-distribution,beta-distribution

to-report random-pert [#minval #likeval #maxval] ;use pert params to draw from a beta distribution if not (#minval <= #likeval and #likeval <= #maxval) [error "wrong argument ranking"] if (#minval = #likeval and #likeval = #maxval) [report #minval] ;;handle trivial inputs let pert-var 1. / 36 let pert-mean (#maxval + 4...

interesting question. I'll sum it up: We need a funcion f(x) f returns an integer if we run f a million times the average of the integer is x(or very close at least) I am sure there are several approaches, but this uses the binomial distribution: http://en.wikipedia.org/wiki/Binomial_distribution Here is the...

performance,distribution,netezza,query-plans

The workaround is to force exhaustive planner to be used. set num_star_planner_rels = X; -- Set X to very high. According to IBM Netezza team, queries with more than 7 entities (# of tables) will use a greedy query planner called "Snowflake". At 7 or less entities, it will use...

r,distribution,estimation,probability-density,weibull

Here is a better attempt, like before it uses optim to find the best value constrained to a set of values in a box (defined by the lower and upper vectors in the optim call). Notice it scales x and y as part of the optimization in addition to...