All,

I have an array as follows:

```
x=runif(1)
cdf=cumsum(c(.2,.5,.1,.05,.05,.01,.09))
>p
[1] 0.20 0.70 0.80 0.85 0.90 0.91 1.00
```

How do I return the index of the corresponding cdf entry for x? for example, .1 would return 1, .98 would return 7)

All,

I have an array as follows:

```
x=runif(1)
cdf=cumsum(c(.2,.5,.1,.05,.05,.01,.09))
>p
[1] 0.20 0.70 0.80 0.85 0.90 0.91 1.00
```

How do I return the index of the corresponding cdf entry for x? for example, .1 would return 1, .98 would return 7)

Try `findInterval`

, e.g.:

```
findInterval(c(0.1, 0.98), cdf) + 1
# [1] 1 7
```

Given your criteria -- that 322 is represented as 3 and 2045 is 20 -- how about dividing by 100 and then rounding towards 0 with trunc(). time_24hr <- c(1404, 322, 1945, 1005, 945) trunc(time_24hr / 100) ...

You can do something like this: print_test<-function(x) { Sys.sleep(x) cat("hello world") } print_test(15) If you want to execute it for a certain amount of iterations use to incorporate a 'for loop' in your function with the number of iterations....

Using dplyr for your first problem: left_join(contacts, listings, by = c("id" = "id")) %>% filter(abs(listing_date - contact_date) < 30) %>% group_by(id) %>% summarise(cnt = n()) %>% right_join(listings) And the output is: id cnt city listing_date 1 6174 2 A 2015-03-01 2 2175 3 B 2015-03-14 3 9176 1 B 2015-03-30...

As per ?zoo: Subscripting by a zoo object whose data contains logical values is undefined. So you need to wrap the subsetting in a which call: log_ret[which(!is.finite(log_ret))] <- 0 log_ret x y z s p t 2005-01-01 0.234 -0.012 0 0 0.454 0 ...

Assuming that you want to get the rowSums of columns that have 'Windows' as column names, we subset the dataset ("sep1") using grep. Then get the rowSums(Sub1), divide by the rowSums of all the numeric columns (sep1[4:7]), multiply by 100, and assign the results to a new column ("newCol") Sub1...

multivariate multiple regression can be done by lm(). This is very well documented, but here follows a little example: rawMat <- matrix(rnorm(200), ncol=2) noise <- matrix(rnorm(200, 0, 0.2), ncol=2) B <- matrix( 1:4, ncol=2) P <- t( B %*% t(rawMat)) + noise fit <- lm(P ~ rawMat) summary( fit )...

Using IRanges, you should use findOverlaps or mergeByOverlaps instead of countOverlaps. It, by default, doesn't return no matches though. I'll leave that to you. Instead, will show an alternate method using foverlaps() from data.table package: require(data.table) subject <- data.table(interval = paste("int", 1:4, sep=""), start = c(2,10,12,25), end = c(7,14,18,28)) query...

Do not use the dates in your plot, use a numeric sequence as x axis. You can use the dates as labels. Try something like this: y=GED$Mfg.Shipments.Total..USA. n=length(y) model_a1 <- auto.arima(y) plot(x=1:n,y,xaxt="n",xlab="") axis(1,at=seq(1,n,length.out=20),labels=index(y)[seq(1,n,length.out=20)], las=2,cex.axis=.5) lines(fitted(model_a1), col = 2) The result depending on your data will be something similar: ...

You can do it with rJava package. install.packages('rJava') library(rJava) .jinit() jObj=.jnew("JClass") result=.jcall(jObj,"[D","method1") Here, JClass is a Java class that should be in your ClassPath environment variable, method1 is a static method of JClass that returns double[], [D is a JNI notation for a double array. See that blog entry for...

You can put your records into a data.frame and then split by the cateogies and then run the correlation for each of the categories. sapply( split(data.frame(var1, var2), categories), function(x) cor(x[[1]],x[[2]]) ) This can look prettier with the dplyr library library(dplyr) data.frame(var1=var1, var2=var2, categories=categories) %>% group_by(categories) %>% summarize(cor= cor(var1, var2)) ...

If you read on the R help page for as.Date by typing ?as.Date you will see there is a default format assumed if you do not specify. So to specify for your data you would do nmmaps$date <- as.Date(nmmaps$date, format="%m/%d/%Y") ...

some reproducible code would allow me to give you some example code, but in the absence of that... wrap what you currently have in another if(), checking for length = 0 (or just && it, with the NULL check first), and display your favorite placeholder message....

if (length(z) %% 2) { z[-c(1, ceiling(length(z)/2), length(z))] } else z[-c(1, c(1,0) + floor(length(z)/2), length(z))] ...

I'm going with the assumption you meant "to the right" since you said "Another solution might be to drawn a polygon around the Baltic Sea and only to select the points within this polygon" # your sample data pts <- read.table(text="lat long 59.979687 29.706236 60.136177 28.148186 59.331383 22.376234 57.699154 11.667305...

You may utilize integer part of the table to store keys in order: function add(t, k, v, ...) if k ~= nil then t[k] = v t[#t+1] = k return add(t, ...) end return t end t = add({ }, "A", "hi", "B", "my", "C", "name", "D", "is") for i,k...

It looks like you're trying to grab summary functions from each entry in a list, ignoring the elements set to -999. You can do this with something like: get_scalar <- function(name, FUN=max) { sapply(mydata[,name], function(x) if(all(x == -999)) NA else FUN(as.numeric(x[x != -999]))) } Note that I've changed your function...

r,optimization,circular,maximization

I would compute all the pairs of rows in df: (pairs <- cbind(1:nrow(df), c(2:nrow(df), 1))) # [,1] [,2] # [1,] 1 2 # [2,] 2 3 # [3,] 3 4 # [4,] 4 5 # [5,] 5 6 # [6,] 6 1 You can find the best pairing with which.max:...

You can try with difftime df1$time.diff <- with(df1, difftime(time.stamp2, time.stamp1, unit='min')) df1 # time.stamp1 time.stamp2 time.diff #1 2015-01-05 15:00:00 2015-01-05 16:00:00 60 mins #2 2015-01-05 16:00:00 2015-01-05 17:00:00 60 mins #3 2015-01-05 18:00:00 2015-01-05 20:00:00 120 mins #4 2015-01-05 19:00:00 2015-01-05 20:00:00 60 mins #5 2015-01-05 20:00:00 2015-01-05 22:00:00 120...

R prefers to use i rather than j. Aslo note that complex is different than as.complex and the latter is used for conversion. You can do myStr <- "0.76+0.41j" myStr_complex <- as.complex(sub("j","i",myStr)) Im(myStr_complex) # [1] 0.41 ...

Use [[ or [ if you want to subset by string names, not $. From Hadley's Advanced R, "x$y is equivalent to x[["y", exact = FALSE]]." ## Create input input <- `names<-`(lapply(landelist, function(x) sample(0:1, 1)), landelist) filterland <- c() for (landeselect in landelist) if (input[[landeselect]] == TRUE) # use `[[`...

copy() is for copying data.table's. You are using it to copy a list. Try.. zz <- lapply(z,copy) zz[[1]][ , newColumn := 1 ] Using your original code, you will see that applying copy() to the list does not make a copy of the original data.table. They are still referenced by...

sapply iterates through the supplied vector or list and supplies each member in turn to the function. In your case, you're getting the values 2 and 4 and then trying to index your vector again using its own values. Since the oth_let1 vector has only two members, you get NA....

You can get the values with get or mget (for multiple objects) lst <- mget(myvector) lapply(seq_along(lst), function(i) write.csv(lst[[i]], file=paste(myvector[i], '.csv', sep='')) ...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

It's generally not a good idea to try to add rows one-at-a-time to a data.frame. it's better to generate all the column data at once and then throw it into a data.frame. For your specific example, the ifelse() function can help list<-c(10,20,5) data.frame(x=list, y=ifelse(list<8, "Greater","Less")) ...

I think this code should produce the plot you want. However, without your exact dataset, I had to generate simulated data. ## Generate dummy data and load library library(ggplot2) df4 = data.frame(Remain = rep(0:1, times = 4), Day = rep(1:4, each = 2), Genotype = rep(c("wtb", "whd"), each = 4),...

You can try cSplit library(splitstackshape) setnames(cSplit(mergedDf, 'PROD_CODE', ','), paste0('X',1:4))[] # X1 X2 X3 X4 #1: PRD0900033 PRD0900135 PRD0900220 PRD0900709 #2: PRD0900097 PRD0900550 NA NA #3: PRD0900121 NA NA NA #4: PRD0900353 NA NA NA #5: PRD0900547 PRD0900614 NA NA Or using the devel version of data.table i.e. v1.9.5 library(data.table) setDT(mergedDf)[,...

Use GetFitARpMLE(z,4) You will get > GetFitARpMLE(z,4) $loglikelihood [1] -2350.516 $phiHat ar1 ar2 ar3 ar4 0.0000000 0.0000000 0.0000000 -0.9262513 $constantTerm [1] 0.05388392 ...

You can simply use input$selectRunid like this: content(GET( "http://stats", path="gentrap/alignments", query=list(runIds=input$selectRunid, userId="dev") add_headers("X-SENTINEL-KEY"="dev"), as = "parsed")) It is probably wise to add some kind of action button and trigger download only on click....

r,function,optimization,mathematical-optimization

I think you want to minimize the square of a-fptotal ... ff <- function(x) myfun(x)^2 > optimize(ff,lower=0,upper=30000) $minimum [1] 28356.39 $objective [1] 1.323489e-23 Or find the root (i.e. where myfun(x)==0): uniroot(myfun,interval=c(0,30000)) $root [1] 28356.39 $f.root [1] 1.482476e-08 $iter [1] 4 $init.it [1] NA $estim.prec [1] 6.103517e-05 ...

A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above): import <- lapply(files, read.csv, header=FALSE) Then if you want to operate on each data.frame in the list...

sql,tsql,recursion,order,hierarchy

The easiest way would be to pad the keys to a fixed length. e.g. 038,007 will be ordered before 038,012 But the padding length would have to be safe for the largest taskid. Although you could keep your path trimmed for readability and create an extra padded field for sorting....

In linux, you could use awk with fread or it can be piped with read.table. Here, I changed the delimiter to , using awk pth <- '/home/akrun/file.txt' #change it to your path v1 <- sprintf("awk '/^(ID_REF|LMN)/{ matched = 1} matched {$1=$1; print}' OFS=\",\" %s", pth) and read with fread library(data.table)...

This should get you headed in the right direction, but be sure to check out the examples pointed out by @Jaap in the comments. library(ggmap) map <- get_map(location = "Mumbai", zoom = 12) df <- data.frame(location = c("Airoli", "Andheri East", "Andheri West", "Arya Nagar", "Asalfa", "Bandra East", "Bandra West"), values...

You could loop through the rows of your data, returning the column names where the data is set with an appropriate number of NA values padded at the end: `colnames<-`(t(apply(dat == 1, 1, function(x) c(colnames(dat)[x], rep(NA, 4-sum(x))))), paste("Impair", 1:4)) # Impair1 Impair2 Impair3 Impair4 # 1 "A" NA NA NA...

Or you could place a rectangle on the region of interest: rect(xleft=1994,xright = 1998,ybottom=range(CVD$cvd)[1],ytop=range(CVD$cvd)[2], density=10, col = "blue") ...

I would create a list of all your matrices using mget and ls (and some regex expression according to the names of your matrices) and then modify them all at once using lapply and colnames<- and rownames<- replacement functions. Something among these lines l <- mget(ls(patter = "m\\d+.m")) lapply(l, function(x)...

r,data.table,stata,code-translation

Your intuition is correct. collapse is the Stata equivalent of R's aggregate function, which produces a new dataset from an input dataset by applying an aggregating function (or multiple aggregating functions, one per variable) to every variable in a dataset.

It's easier to think of it in terms of the two exposures that aren't used, rather than the five that are. Let's limit the number of times an exposure can be excluded: draw_exc <- function(exposures,nexp,ng,max_excluded = 10){ nexc <- length(exposures)-nexp exp_rem <- exposures exc <- matrix(,ng,nexc) for (i in 1:ng){...

Using data.table library(data.table) setDT(df1)[, list(pages=paste(page, collapse="_")), list(user_id, date=as.Date(date, '%m/%d/%Y'))] Or using dplyr library(dplyr) df1 %>% group_by(user_id, date=as.Date(date, '%m/%d/%Y')) %>% summarise(pages=paste(page, collapse='_')) ...

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

r,if-statement,recursion,vector,integer

Your sapply call is applying fun across all values of x, when you really want it to be applying across all values of i. To get the sapply to do what I assume you want to do, you can do the following: sapply(X = 1:length(x), FUN = fun, x =...

r,string-split,stemming,text-analysis

Given a list of English words you can do this pretty simply by looking up every possible split of the word in the list. I'll use the first Google hit I found for my word list, which contains about 70k lower-case words: wl <- read.table("http://www-personal.umich.edu/~jlawler/wordlist")$V1 check.word <- function(x, wl) {...

Here's a solution for extracting the article lines only. Turned out much more complex and cryptic than I'd been hoping, but I'm pretty sure it works. Also, thanks to akrun for the test data. v1 <- c('ard','b','','','','rr','','fr','','','','','gh','d'); ind <-...

If you only have 4 GBs of RAM you cannot put 5 GBs of data 'into R'. You can alternatively look at the 'Large memory and out-of-memory data' section of the High Perfomance Computing task view in R. Packages designed for out-of-memory processes such as ff may help you. Otherwise...

Change the panel.margin argument to panel.margin = unit(c(-0.5,0-0.5,0), "lines"). For some reason the top and bottom margins need to be negative to line up perfectly. Here is the result: ...

You can create a similar plot in ggplot, but you will need to do some reshaping of the data first. library(reshape2) #ggplot needs a dataframe data <- as.data.frame(data) #id variable for position in matrix data$id <- 1:nrow(data) #reshape to long format plot_data <- melt(data,id.var="id") #plot ggplot(plot_data, aes(x=id,y=value,group=variable,colour=variable)) + geom_point()+ geom_line(aes(lty=variable))...

You can try library(data.table)#v1.9.4+ setDT(yourdf)[, .N, by = A] ...

Combining the example by @Robert and code from the answer featured here: How to get a reversed, log10 scale in ggplot2? library("scales") library(ggplot2) reverselog_trans <- function(base = exp(1)) { trans <- function(x) -log(x, base) inv <- function(x) base^(-x) trans_new(paste0("reverselog-", format(base)), trans, inv, log_breaks(base = base), domain = c(1e-100, Inf)) }...

You are just saving a map into variable and not displaying it. Just do library(ggmap) map <- qmap('Anaheim', zoom = 10, maptype = 'roadmap') map Or library(ggmap) qmap('Anaheim', zoom = 10, maptype = 'roadmap') ...