r,statistics,distribution,random-sample

x <- lapply(c(1:20000), function(x){ lapply(c(1:2), function(y) rnorm(50,2.5,3)) }) This produces 20000 paired samples, where each sample is composed of 50 observations from a N(2.5,3^2) distribution. Note that x is a list where each slot is a list of two vector of length 50. To t-test the samples, you'll need to...

algorithm,hadoop,mapreduce,sample,random-sample

I'm not sure what "elegant" means, but perhaps you're interested in something analogous to reservoir sampling. Let k be the size of the sample and initialize a k-element array with nulls. The elements from which we are sampling arrive one by one. When the jth (counting from 1) element arrives,...

I think your problem can be solved by generating the distribution in a reactive function like so: get_observations <- reactive( { return(rnorm(input$observations,mean=0,sd=1)) }) if (input$individual_obs) { rug(get_observations(), col = "red") } if (input$density) { dens <- density(get_observations(), kernel = input$kernel, adjust = input$bw_adjust) lines(dens, col = "blue") } get_observations will...

A simple/easy way to do this is to create an array of integers from 0 to n - 1 where n is the length of the first array. Shuffle this array, and then use the values in it as indices for iteration over the original array. There's no standard shuffle...

opencv,random-sample,probability-density

As I know, OpenCV has no functions for your task, but using RNG::uniform you can generate samples as you want, take a look at this paper: paper.

python,algorithm,random-sample

If you know in advance the total number of items that will be yielded by an iterable population, it is possible to yield the items of a sample of population as you come to them (not only after reaching the end). If you don't know the population size ahead of...

Here's a straightforward implementation of the rejection sampling. There may be a faster way to do the adjacency check than the query_pairs thing (which in this case also will check for collisions), since you only want to test if there is at least one pair within this distance threshold. import...

language-agnostic,iterator,random-sample

Encryption is reversible, hence an encryption is a one-to-one mapping from a set onto itself. Pick a block cypher with a large enough block size to cover the number of items you have. Encrypt the numbers 0, 1, 2, 3, 4, ... This will give you a non-repeating ordered list...

matlab,random-sample,deterministic

As I stated in the my comment, I don't really understand what you're asking. But, I will answer this as if you had asked it on codereview. The following is not good practice in MATLAB: A1=24; A2=23; A3=23; A4=23; A5=10; There are very few cases (if any), where you actually...

I think you may be misunderstanding what rdirichlet(...) does (BTW: you do have to spell it correctly...). rdirichlet(n,alpha) returns a matrix with n rows, and length(alpha) columns. Each row corresponds to a random deviate taken from the gamma distribution with scale parameter given by the corresponding element of alpha, normalized...

If I understand your question correctly, I don't think the randi function is the way to start here. I would suggest the following procedure: Start with a list with 500*500 elements, with 7000 elements set to 1 and the rest to 0 Randomize the order of elements in the list...

python,string,python-2.7,random,random-sample

numbers = ["1", "2", "3"] letters = ["X", "Y", "Z"] from random import sample, shuffle samp = sample(letters,2)+sample(numbers*3,8) shuffle(samp) print("".join(samp)) 113332X2Z2 Or use choice and range: from random import sample, shuffle,choice samp = sample(letters,2)+[choice(numbers) for _ in range(8)] shuffle(samp) print("".join(samp)) 1212ZX1131 ...

java,filewriter,random-sample,bufferedwriter,file-writing

In this section of your code while ((line = buf.readLine()) != null) { lineCopy=line; String [] LineArray=lineCopy.split(","); lineCopy=LineArray[0]; if (word.equals(lineCopy)) { System.out.println(line); writer1.write(line); } writer1.newLine(); } writer.close(); You should probably replace writer.close()with writer1.close(); for the only other writer variable that appears in your code is a local variable in your...

r,data.frame,repeat,random-sample

Basically your question boils down to how to replace randomly selected elements of your data with 0. You can do this pretty simply with runif, in this case replacing each value with 0 with probability 0.1: set.seed(144) data[-1] <- sapply(data[-1], function(x) ifelse(runif(length(x)) < 0.1, 0, x)) data # id X1...

r,data.frame,plyr,random-sample

dat$x3 <- ave( dat$x2, dat$x1, FUN=sample) The way you have constructed the output (to have the same number of entries as there were rows of the dataframe) you will get permutations of x2 values within distinct values of x1. (Edited your code to make it run.)...

The only way to figure out what is fastest for you is to do a comparison of the different methods. In fact the loop appears to be very fast in this case! pop = randn(1,100); n = [1 3 10 6 2]; tic sr = @(n) sum(randsample(pop,n)); sum_sample = arrayfun(sr,n);...

Yeah, you should use random.sample(). It will make your code cleaner and as a bonus you get a performance increase. Performance issues with your loop solution: a) It has to check in the output list before choosing any number b) does rejection sampling, so the expected time will be higher...

java,statistics,boxplot,random-sample

Supposed that min, a, median, b, max values separate quartiles of distribution (http://en.wikipedia.org/wiki/Quartile): static public double next(Random rnd, double median, double a, double b, double min, double max) { double d = -3; while (d > 2.698 || d < -2.698) { d = rnd.nextGaussian(); } if (Math.abs(d) < 0.6745)...

I would sample once and turn the result into a data.frame, which can be passed to paste0: set.seed(42) do.call(paste0, as.data.frame(matrix(sample(LETTERS, 50, TRUE), ncol = 5))) #[1] "XLXTJ" "YSDVL" "HYZKA" "VGYRZ" "QMCAL" "NYNVY" "TZKAX" "DDXFQ" "RMLXZ" "SOVPQ" ...

In principle, you want to do this using expand.grid I believe. Using your example data, I worked out the basics here: dat <- data.frame(A = c(1, 4, 5, 3, NA, 5), B = c(6, 5, NA, 5, 3, 5), C = c(5, 3, 1, 5, 3, 7), D = c(5,...

One way is to assign a cumulative sum column to mtcars so you're not having to recalculate that all the time. mtcars$cumsum <- cumsum(mtcars$count) Car.ID <- function(x) { if (x < mtcars$cumsum[1]) { return(paste(rownames(mtcars)[1], x, sep = ":")) } else { row <- tail(which(mtcars$cumsum < x), n = 1) return(paste(rownames(mtcars)[row...

The problem lies in hist, not in sample. You can check that doing: > table(sample(0:15, 10000, replace=T)) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 634 642 664 654 628 598 633 642 647 625 587 577 618 645 615 591 From...

c++,algorithm,random-sample,primality-test

You might want to take a look at the Miller-Rabin primality test. In this test you use a series of "witness" values and perform some calculations. Each witness calculation gives a result of "composite" or "possibly prime". If you use k witnesses and they all give "possibly prime" results, the...

r,for-loop,web-scraping,random-sample

Use something like this. Loop over all the product index randomly. for (i in sample(1:x)){ <Your code here> # Sleep for 120 seconds Sys.sleep(120) } And if you want to do 10 at a time. Sleep for 120 seconds every 10 executions. n = 1 for (i in sample(1:x)){ #...

Initially give each track a weight w, e.g. 10 - a vote up increases this, down reduces it (but never to 0). Then when deciding which track to play next: Calculate the total of all the weights, generate a random number between 0 and this total, and step through the...

function,macros,sas,distribution,random-sample

What you need is the inverse cumulative distribution function. This is the function that is the inverse of the normalized integral of the distribution over the entire domain. So at 0% is your most negative possible value and 100% is your most positive. Practically though you would calmp to something...

sql,postgresql,indexing,random-sample,postgresql-performance

After initializing max_id as max(id) - 1000 to leave room for 1000 rows, this should be using the index: UPDATE table SET test = true FROM (SELECT (random() * max_id)::bigint AS lower_bound) t WHERE id BETWEEN t.lower_bound AND t.lower_bound + 999; No need for the complicated structure with a CTE...

python-2.7,numpy,random-sample

The way you're doing it is sound. However, you could use the more intuitive nonzero function: random.sample(visited.nonzero(), k) EDIT: As to the second question in you comment, you can inverse the "zeroness" of you array: visited==0. You get: random.sample((visited==0).nonzero(), k) ...

By using process substitution (thanks Tom Fenech), both commands are seen as files. Then using cat we can concatenate these "files" together and output to STDOUT. cat <(awk '/^#/' file) <(awk '!/^#/' file | shuf -n 10) Input #blah de blah 1 2 3 4 5 6 7 8 9...

copy,sample,julia-lang,deep-copy,random-sample

Presumably you only need to copy if the different copies will later mutate in different ways. If there's just breeding and selection with no mutation, then a reference to the "copied" individual would be sufficient. FYI deepcopy is (in currently julia releases) slow; if you need performance, you should write...

This doesn't answer your question about how to do this with the "sampling" package, but I've written a function called stratified that will do this for you. If you have "devtools" installed, you can load it like this: library(devtools) source_gist(6424112) Otherwise, just copy the code of the function from the...

arrays,random,julia-lang,random-sample

Use the StatsBase.jl package, i.e. Pkg.add("StatsBase") # Only do this once, obviously using StatsBase items = ["a", 2, 5, "h", "hello", 3] weights = [0.1, 0.1, 0.2, 0.2, 0.1, 0.3] sample(items, WeightVec(weights)) Or if you want to sample many: # With replacement my_samps = sample(items, WeightVec(weights), 10) # Without replacement...

java,algorithm,sorting,random,random-sample

Modified Fisher-Yates algorithm The shuffle solution can be improved, since you only have to shuffle the first k elements of the array. But that's still O(n) because the naïve shuffle implementation requires an array of size n, which needs to be initialized to the n values from 0 to n-1....

matlab,distribution,sampling,random-sample

So, you can use this, for example: y = 0.8 + rand*0.4; this will generate random number between 0.8 and 1.2. because rand creates uniform distribution I believe that rand*0.4 creates the same ;) ...

python,numpy,random,pandas,random-sample

The number of items in your resulting sample (n attempts each independently with probability p) has a binomial distribution and thus can rapidly be randomly generated e.g with numpy: sample_size = numpy.random.binomial(len(population). p) Now, the_sample = random.sample(population, sample_size) gives you exactly what you desire -- the equivalent of randomly, independently...

If you don't have a hard dependency on mySeq really being a lazy sequence, you can just make it an array instead. let ran = System.Random(10001100) let mySeq = Array.init 10 (fun i -> ran.Next()) for time in 0..4 do for element in mySeq do printf "%O " element printf...

sql,sql-server,sample,random-sample

You want a stratified sample. I would recommend doing this by sorting the data by course code and doing an nth sample. Here is one method that works best if you have a large population size: select d.* from (select d.*, row_number() over (order by coursecode, newid) as seqnum, count(*)...

To make these match, you need two things: The seed used to generate the random number The formula used to generate the random number SAS uses for rannor (and I think also for rand, but I haven't seen confirmation of this), the following algorithm (found in Psuedo-Random Numbers: Out of...

You can try n <- 2 df[with(df, transactionID %in% sample(unique(transactionID),n, replace=FALSE)),] # transactionID desc #1 1 a #2 1 d #3 1 a #17 8 f #18 8 d data df <- structure(list(transactionID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 7L, 7L, 7L,...

I'll throw my proposed solution in here as well: # for example a = np.random.random_integers(0, 500, size=(200,1000)) N = 200 result = np.zeros((200,1000)) ia = np.arange(result.size) tw = float(np.sum(a.ravel())) result.ravel()[np.random.choice(ia, p=a.ravel()/tw, size=N, replace=False)]=1 where a is the array of weights: that is, pick the indexes for the items to change...

Here is some sample data: A <- seq_len(75) B <- rpois(75, 3) B <- B / sum(B) So now B is a probability vector for each element in A. To pull 25 samples, simply use sample(A, size = 25, replace = FALSE, prob = B). Fill the matrix as usual...