r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

Use addNA to treat NA as a distinct level of x. > temp.df$x <- addNA(temp.df$x) > aggregate(count ~ x + y, data=temp.df, FUN=sum, na.rm=FALSE, na.action=na.pass) x y count 1 1 A 2 2 <NA> A 2 3 3 B 1 4 10 B 1 ...

Use GetFitARpMLE(z,4) You will get > GetFitARpMLE(z,4) $loglikelihood [1] -2350.516 $phiHat ar1 ar2 ar3 ar4 0.0000000 0.0000000 0.0000000 -0.9262513 $constantTerm [1] 0.05388392 ...

You can put your records into a data.frame and then split by the cateogies and then run the correlation for each of the categories. sapply( split(data.frame(var1, var2), categories), function(x) cor(x[[1]],x[[2]]) ) This can look prettier with the dplyr library library(dplyr) data.frame(var1=var1, var2=var2, categories=categories) %>% group_by(categories) %>% summarize(cor= cor(var1, var2)) ...

Internally it does this (which does not involve zoo): y <- c(NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.661698, 3.107128, 7.319669, 10.800864, 17.855491, 18.250267, 28.587002, 36.405397, 38.467383, 38.685956, 43.917737, 40.829615, 43.519173, 45.597497, 43.252656, 45.581646, 48.258325, 48.269969, 50.905045, 53.258165, 58.39137, 59.27844, 58.720518, 56.933438, 62.062116, 59.860849,...

Assuming that you want to get the rowSums of columns that have 'Windows' as column names, we subset the dataset ("sep1") using grep. Then get the rowSums(Sub1), divide by the rowSums of all the numeric columns (sep1[4:7]), multiply by 100, and assign the results to a new column ("newCol") Sub1...

Or you could place a rectangle on the region of interest: rect(xleft=1994,xright = 1998,ybottom=range(CVD$cvd)[1],ytop=range(CVD$cvd)[2], density=10, col = "blue") ...

Here's a solution for extracting the article lines only. Turned out much more complex and cryptic than I'd been hoping, but I'm pretty sure it works. Also, thanks to akrun for the test data. v1 <- c('ard','b','','','','rr','','fr','','','','','gh','d'); ind <-...

You are just saving a map into variable and not displaying it. Just do library(ggmap) map <- qmap('Anaheim', zoom = 10, maptype = 'roadmap') map Or library(ggmap) qmap('Anaheim', zoom = 10, maptype = 'roadmap') ...

Try split(vec,cumsum(c(1, abs(diff(vec))))) #$`1` #[1] 1 1 1 1 1 1 #$`2` #[1] 0 0 0 0 0 0 0 0 0 0 #$`3` #[1] 1 1 1 1 1 1 1 1 1 1 1 #$`4` #[1] 0 0 0 0 Or use rle split(vec,inverse.rle(within.list(rle(vec), values <- seq_along(values)))) If...

I'm going with the assumption you meant "to the right" since you said "Another solution might be to drawn a polygon around the Baltic Sea and only to select the points within this polygon" # your sample data pts <- read.table(text="lat long 59.979687 29.706236 60.136177 28.148186 59.331383 22.376234 57.699154 11.667305...

Two small changes: mvad_long$id <- as.factor(mvad_long$id) ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+ geom_tile()+facet_wrap(~cluster,scales = "free_y") ggplot was treating id as a numerical variable, rather than a factor, and then the scales were fixed....

r,arguments,string-matching,agrep

From the help file: If ‘cost’ is not given, ‘all’ defaults to 10%, and the other transformation number bounds default to ‘all’. As far as I understand it means that either cost or all is a limiting factor even if you set del, ins and sub. If you want to...

Here's a recommended way to ask a question, focusing on the fact that your actual data is too big, too complicated, or too private to share. Question: how to apply a function on each row of a data.frame? My data: # make up some data s <- "Lorem ipsum dolor...

The cause of the error: At the beginning of the function call, the elements of value all have class "character". But when you hit value[value=="secondary"] <- label_secondary a bunch of those elements get replaced by expressions. So when you then try to do value[value=="primary"] <- label_primary R is trying to...

You can try cSplit library(splitstackshape) setnames(cSplit(mergedDf, 'PROD_CODE', ','), paste0('X',1:4))[] # X1 X2 X3 X4 #1: PRD0900033 PRD0900135 PRD0900220 PRD0900709 #2: PRD0900097 PRD0900550 NA NA #3: PRD0900121 NA NA NA #4: PRD0900353 NA NA NA #5: PRD0900547 PRD0900614 NA NA Or using the devel version of data.table i.e. v1.9.5 library(data.table) setDT(mergedDf)[,...

If I understand correctly you want to evaluate the first expression with the first value of x, the second with the second etc. You could do: mapply(function(ex, x) eval(ex, envir = list(x = x)), funs.list[1:2], c(7, 60)) ...

Try library(stringr) str_extract(word, '.*(?=\\.csv)') #[1] "dirtyboards" Another option which works for the example provided (and not very specific) str_extract(word, '^[^.]+') #[1] "dirtyboards" Update Including 'foo.csv.csv', word1 <- c("dirtyboards.csv" , "boardcsv.csv", "foo.csv.csv") str_extract(word1, '.*(?=\\.csv$)') #[1] "dirtyboards" "boardcsv" "foo.csv" ...

If you melt your data.table to long format, this is easy: library(reshape2) news1 <- melt(news, id.vars = "ID") news2 <- news1[abs(value) > 0.01,] # ID variable value #1: 8 diff.jan 0.101 #2: 202 diff.apr 10.000 #3: 203 diff.apr 11.000 #4: 50 diff.aug 0.221 dcast.data.table(news2, ID ~ variable) # ID diff.jan...

R prefers to use i rather than j. Aslo note that complex is different than as.complex and the latter is used for conversion. You can do myStr <- "0.76+0.41j" myStr_complex <- as.complex(sub("j","i",myStr)) Im(myStr_complex) # [1] 0.41 ...

The problem arises from you mixture of subsetting types here: df$target[which(df$snakes=='a'),] Once you use $ the output is no longer a data.frame, and the two parameter [ subsetting is no longer valid. You are better off compacting it to: sum(df[df$snakes=="a","target"]) [1] 23 As for your model, you can just create...

r,function,optimization,mathematical-optimization

I think you want to minimize the square of a-fptotal ... ff <- function(x) myfun(x)^2 > optimize(ff,lower=0,upper=30000) $minimum [1] 28356.39 $objective [1] 1.323489e-23 Or find the root (i.e. where myfun(x)==0): uniroot(myfun,interval=c(0,30000)) $root [1] 28356.39 $f.root [1] 1.482476e-08 $iter [1] 4 $init.it [1] NA $estim.prec [1] 6.103517e-05 ...

There is nothing wrong with the >=, your problem is that 1 is not really one. Try this Ax >= 1 [1] FALSE Ax == 1 [1] FALSE and format(Ax, digits = 20) [1] "0.99999999999999977796" Edit: A possible Solution As solutions to your problems you can return the final result...

r,optimization,circular,maximization

I would compute all the pairs of rows in df: (pairs <- cbind(1:nrow(df), c(2:nrow(df), 1))) # [,1] [,2] # [1,] 1 2 # [2,] 2 3 # [3,] 3 4 # [4,] 4 5 # [5,] 5 6 # [6,] 6 1 You can find the best pairing with which.max:...

You can try library(data.table)#v1.9.4+ setDT(yourdf)[, .N, by = A] ...

In linux, you could use awk with fread or it can be piped with read.table. Here, I changed the delimiter to , using awk pth <- '/home/akrun/file.txt' #change it to your path v1 <- sprintf("awk '/^(ID_REF|LMN)/{ matched = 1} matched {$1=$1; print}' OFS=\",\" %s", pth) and read with fread library(data.table)...

r,mathematical-optimization,quadprog,quadratic-programming

You can do this with the solve.QP function from quadprog. From ?solve.QP, we read that solve.QP solves systems of the form min_b {-d'b + 0.5 b'Db | A'b >= b0}. You are solving a problem of the form min_w {-A'w + pw'Cw | w >= 0, 1'w = 1}. Thus,...

r,data.table,stata,code-translation

Your intuition is correct. collapse is the Stata equivalent of R's aggregate function, which produces a new dataset from an input dataset by applying an aggregating function (or multiple aggregating functions, one per variable) to every variable in a dataset.

A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above): import <- lapply(files, read.csv, header=FALSE) Then if you want to operate on each data.frame in the list...

It's generally not a good idea to try to add rows one-at-a-time to a data.frame. it's better to generate all the column data at once and then throw it into a data.frame. For your specific example, the ifelse() function can help list<-c(10,20,5) data.frame(x=list, y=ifelse(list<8, "Greater","Less")) ...

Here is some sample code based on what you had in your original problem which will aggregate Twitter results for a set of users: # create a data frame with 4 columns and no rows initially df_result <- data.frame(t(rep(NA, 4))) names(df_result) <- c('id', 'name', 's_name', 'fol_count') df_result <- df_result[0:0,] #...

Do not use the dates in your plot, use a numeric sequence as x axis. You can use the dates as labels. Try something like this: y=GED$Mfg.Shipments.Total..USA. n=length(y) model_a1 <- auto.arima(y) plot(x=1:n,y,xaxt="n",xlab="") axis(1,at=seq(1,n,length.out=20),labels=index(y)[seq(1,n,length.out=20)], las=2,cex.axis=.5) lines(fitted(model_a1), col = 2) The result depending on your data will be something similar: ...

In the link that I mentioned in the comment, you can find solutions using RCurl and httr package. Here, I provide the solution using rvest package. library(rvest) kk<-html("http://en.wikipedia.org/wiki/List_of_S%26P_500_companies")%>% html_table(fill=TRUE)%>% .[[1]] //table 1 only head(kk) Ticker symbol Security SEC filings GICS Sector GICS Sub Industry Address of Headquarters 1 MMM 3M...

If you need the comments, you still can replace the 6th comma with a semicolon and use your previous solution: gsub("((?:[^,]*,){5}[^,]*),", "\\1;", vec1, perl=TRUE) Regex explanation: ((?:[^,]*,){5}[^,]*) - a capturing group that we will reference to as Group 1 with \\1 in the replacement pattern, matching (?:[^,]*,){5} - 5 sequences...

Try rl <- with(mydf, rle(x >y)) grp <- inverse.rle(within.list(rl , values <- seq_along(values))) split(mydf, grp) #$`1` # x y #1 1 10 #2 2 9 #3 3 8 #4 4 7 #5 5 6 #$`2` # x y #6 6 5 #7 7 4 #8 8 3 #9 9 2...

Using IRanges, you should use findOverlaps or mergeByOverlaps instead of countOverlaps. It, by default, doesn't return no matches though. I'll leave that to you. Instead, will show an alternate method using foverlaps() from data.table package: require(data.table) subject <- data.table(interval = paste("int", 1:4, sep=""), start = c(2,10,12,25), end = c(7,14,18,28)) query...

It looks like you're trying to grab summary functions from each entry in a list, ignoring the elements set to -999. You can do this with something like: get_scalar <- function(name, FUN=max) { sapply(mydata[,name], function(x) if(all(x == -999)) NA else FUN(as.numeric(x[x != -999]))) } Note that I've changed your function...

As per ?zoo: Subscripting by a zoo object whose data contains logical values is undefined. So you need to wrap the subsetting in a which call: log_ret[which(!is.finite(log_ret))] <- 0 log_ret x y z s p t 2005-01-01 0.234 -0.012 0 0 0.454 0 ...

Use [[ or [ if you want to subset by string names, not $. From Hadley's Advanced R, "x$y is equivalent to x[["y", exact = FALSE]]." ## Create input input <- `names<-`(lapply(landelist, function(x) sample(0:1, 1)), landelist) filterland <- c() for (landeselect in landelist) if (input[[landeselect]] == TRUE) # use `[[`...

If you're willing to give xml2 a go, you can get to begin in a few lines: library(xml2) library(magrittr) # get a vector doc <- read_xml("~/Dropbox/Data.xml") doc %>% xml_find_all("//d1:event/d1:begin", ns=xml_ns(doc)) %>% xml_text() %>% as.numeric() ## [1] 0.24 0.73 1.25 1.75 2.24 2.75 3.27 3.76 4.30 4.77 5.28 5.78 6.32 6.82...

r,nested,time-series,lapply,sapply

Using plyr: As a matrix (time in cols, rows corresponding to rows of df): aaply(df, 1, function(x) weisurv(t, x$sc, x$shp), .expand = FALSE) As a list: alply(df, 1, function(x) weisurv(t, x$sc, x$shp)) As a data frame (structure as per matrix above): adply(df, 1, function(x) setNames(weisurv(t, x$sc, x$shp), t)) As a...

Reusing the chunks in a slightly different way (thanks @george-dontas) gets me what I want. The calculated values before the R and the R with its discussion in the appendix. \documentclass{article} \begin{document} \title{For Bosses and R Experts} \author{Joe Collins} \maketitle <<*, echo=FALSE, include=FALSE>>= <<data>> <<chart>> <<statistics>> @ \section{For the Boss}...

You're almost there. As @BondedDust suggests, it's not practical to use a two-level factor (Trap) as a random effect; in fact, it doesn't seem right in principle either (the levels of Trap are not arbitrary/randomly chosen/exchangeable). When I tried a model with quadratic altitude, fixed effect of trap, and random...

input is just a reactivevalues object so you can use [[: print(input[[a]]) ...

r,string-split,stemming,text-analysis

Given a list of English words you can do this pretty simply by looking up every possible split of the word in the list. I'll use the first Google hit I found for my word list, which contains about 70k lower-case words: wl <- read.table("http://www-personal.umich.edu/~jlawler/wordlist")$V1 check.word <- function(x, wl) {...

r,if-statement,recursion,vector,integer

Your sapply call is applying fun across all values of x, when you really want it to be applying across all values of i. To get the sapply to do what I assume you want to do, you can do the following: sapply(X = 1:length(x), FUN = fun, x =...

This should get you headed in the right direction, but be sure to check out the examples pointed out by @Jaap in the comments. library(ggmap) map <- get_map(location = "Mumbai", zoom = 12) df <- data.frame(location = c("Airoli", "Andheri East", "Andheri West", "Arya Nagar", "Asalfa", "Bandra East", "Bandra West"), values...

If you read on the R help page for as.Date by typing ?as.Date you will see there is a default format assumed if you do not specify. So to specify for your data you would do nmmaps$date <- as.Date(nmmaps$date, format="%m/%d/%Y") ...

You can do something like this: print_test<-function(x) { Sys.sleep(x) cat("hello world") } print_test(15) If you want to execute it for a certain amount of iterations use to incorporate a 'for loop' in your function with the number of iterations....

python,sql,r,graph,connected-components

In R, you could use the package igraph: library(igraph) gg <- graph.edgelist(as.matrix(d), directed=F) split(V(gg)$name, clusters(gg)$membership) #$`1` #[1] "a" "b" "c" "u" "e" # #$`2` #[1] "f" "g" "h" "j" # #$`3` #[1] "z" "y" And you can look at the graph using: plot(gg) This is based on an excellent answer...

You can try with difftime df1$time.diff <- with(df1, difftime(time.stamp2, time.stamp1, unit='min')) df1 # time.stamp1 time.stamp2 time.diff #1 2015-01-05 15:00:00 2015-01-05 16:00:00 60 mins #2 2015-01-05 16:00:00 2015-01-05 17:00:00 60 mins #3 2015-01-05 18:00:00 2015-01-05 20:00:00 120 mins #4 2015-01-05 19:00:00 2015-01-05 20:00:00 60 mins #5 2015-01-05 20:00:00 2015-01-05 22:00:00 120...

If you only have 4 GBs of RAM you cannot put 5 GBs of data 'into R'. You can alternatively look at the 'Large memory and out-of-memory data' section of the High Perfomance Computing task view in R. Packages designed for out-of-memory processes such as ff may help you. Otherwise...

Assuming your workingNational data doesn't have gaps or other irregularities, you could look up the location of each ad time in workingNational and then just take the five entries leading up to that time: indices <- match(tvNationalSale$Ad.Time, workingNational$datetime) tvNationalSale$fiveMinutesBefore <- rowSums(sapply(1:5, function(x) workingNational$sessions[indices-x])) head(tvNationalSale) # Ad.Time fiveMinutesBefore # 1 2015-01-03...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

multivariate multiple regression can be done by lm(). This is very well documented, but here follows a little example: rawMat <- matrix(rnorm(200), ncol=2) noise <- matrix(rnorm(200, 0, 0.2), ncol=2) B <- matrix( 1:4, ncol=2) P <- t( B %*% t(rawMat)) + noise fit <- lm(P ~ rawMat) summary( fit )...

if (length(z) %% 2) { z[-c(1, ceiling(length(z)/2), length(z))] } else z[-c(1, c(1,0) + floor(length(z)/2), length(z))] ...

You can simply use input$selectRunid like this: content(GET( "http://stats", path="gentrap/alignments", query=list(runIds=input$selectRunid, userId="dev") add_headers("X-SENTINEL-KEY"="dev"), as = "parsed")) It is probably wise to add some kind of action button and trigger download only on click....

You can do it with rJava package. install.packages('rJava') library(rJava) .jinit() jObj=.jnew("JClass") result=.jcall(jObj,"[D","method1") Here, JClass is a Java class that should be in your ClassPath environment variable, method1 is a static method of JClass that returns double[], [D is a JNI notation for a double array. See that blog entry for...

You could loop through the rows of your data, returning the column names where the data is set with an appropriate number of NA values padded at the end: `colnames<-`(t(apply(dat == 1, 1, function(x) c(colnames(dat)[x], rep(NA, 4-sum(x))))), paste("Impair", 1:4)) # Impair1 Impair2 Impair3 Impair4 # 1 "A" NA NA NA...

It is actually simple to do this with data.table. Recreating your sample data: test.data <- read.table( text = " ID COUNT SCORE VALUE ORIG_DATE CLOSE_DATE 10748 3 750 450231 2015-03-01 2015-06-01 10845 4 680 590231 2015-01-01 2015-05-01 21758 7 760 650839 2014-11-01 2015-06-01", header = TRUE, stringsAsFactors = FALSE, colClasses...

You can create a similar plot in ggplot, but you will need to do some reshaping of the data first. library(reshape2) #ggplot needs a dataframe data <- as.data.frame(data) #id variable for position in matrix data$id <- 1:nrow(data) #reshape to long format plot_data <- melt(data,id.var="id") #plot ggplot(plot_data, aes(x=id,y=value,group=variable,colour=variable)) + geom_point()+ geom_line(aes(lty=variable))...

You can get the values with get or mget (for multiple objects) lst <- mget(myvector) lapply(seq_along(lst), function(i) write.csv(lst[[i]], file=paste(myvector[i], '.csv', sep='')) ...

You can use findInterval in combination with by; by(df,findInterval(df$Y,quantile(df$Y,c(0.25,0.5,0.75))),estFun) ...

This will give what you want: Foo <- function(x){ if (length(ll[[x]])>1) { print(x) x <- mean(ll[[x]]) } else { x <- ll[[x]] } } ll_new <- lapply(setNames(names(ll), names(ll)), Foo) # [1] "B" ll_new # $A # [1] 10 # # $B # [1] 25 # # $C # [1] 40...

I think this code should produce the plot you want. However, without your exact dataset, I had to generate simulated data. ## Generate dummy data and load library library(ggplot2) df4 = data.frame(Remain = rep(0:1, times = 4), Day = rep(1:4, each = 2), Genotype = rep(c("wtb", "whd"), each = 4),...

An option using data.table library(data.table) setDT(df1)[, Count:=.N, ID] # ID category Count #1: 101 A 3 #2: 101 B 3 #3: 101 C 3 #4: 102 A 1 #5: 103 B 2 #6: 103 C 2 Or using dplyr library(dplyr) df1 %>% group_by(ID) %>% mutate(Count=n()) Or using base R df1$Count...

In that case, I would use subsetting: v[v>2.2 & v<2.6] or which(v>2.2 & v<2.6) depending on if you want the values or the index...

It's easier to think of it in terms of the two exposures that aren't used, rather than the five that are. Let's limit the number of times an exposure can be excluded: draw_exc <- function(exposures,nexp,ng,max_excluded = 10){ nexc <- length(exposures)-nexp exp_rem <- exposures exc <- matrix(,ng,nexc) for (i in 1:ng){...

I don't see how this is solvable using melt, but you can use a simple rbind here, for example res <- rbind(DT[, c(1,2:3), with = FALSE], DT[, c(1,4:5), with = FALSE], use.names = FALSE)[service1 != ""] res # customer service1 fee1 # 1: 1 1 100 # 2: 2 3...

you can try this ll[order(sapply(ll, FUN = function(x) x[1]))] [[1]] [1] "2015-01-01" "2015-01-10" [[2]] [1] "2015-02-01" "2015-02-10" [[3]] [1] "2015-03-01" "2015-03-10" and from Akrun's comment ll[order(sapply(ll, `[[`, 1))] ...

Try indx <- as.numeric(sub('.*g', '', dat_name[,1])) data1 <- ex.data.frame data1[] <- lapply(ex.data.frame, function(x) dat_name[,1][match(x, indx)]) data1 # c1 c2 c3 #1 At5g003 At5g002 At5g001 #2 At5g004 At5g005 At5g002 #3 At5g001 <NA> At5g003 #4 <NA> <NA> At5g004 #5 <NA> <NA> At5g005 EDIT If the strings as random, you could do indx...

sapply iterates through the supplied vector or list and supplies each member in turn to the function. In your case, you're getting the values 2 and 4 and then trying to index your vector again using its own values. Since the oth_let1 vector has only two members, you get NA....

Set renderer to canvas in set_options: library(ggvis) mtcars %>% ggvis(~wt, ~mpg) %>% layer_points() %>% set_options(width = 300, height = 200, padding = padding(10, 10, 10, 10), renderer = "canvas") ...

Try library(reshape2) df1 <- transform(df, result=as.character(result), red= factor(red, levels= unique(red))) dcast(df1, mult~red, value.var='result', fill='')[-1] # 1 0.9 0.8 0.7 #1 value1 #2 value2 #3 value3 #4 value4 ...

Using data.table library(data.table) setDT(df1)[, list(pages=paste(page, collapse="_")), list(user_id, date=as.Date(date, '%m/%d/%Y'))] Or using dplyr library(dplyr) df1 %>% group_by(user_id, date=as.Date(date, '%m/%d/%Y')) %>% summarise(pages=paste(page, collapse='_')) ...

Try library(dplyr) as.data.frame(m1) %>% group_by(id)%>% summarise_each(funs(max=max(., na.rm=TRUE))) # id sample1 sample2 sample3 #1 1 21282.5 3342 22202 #2 2 18558.0 3047 NA #3 3 3709.0 2338 5709 #4 4 1.0 2 1 Or aggregate(.~id, as.data.frame(m1), FUN= max, na.rm=TRUE, na.action=NULL) NOTE: I am guessing you have real NAs in the dataset...

ff and ffbase offer out of memory R vectors, but introduce a reference semantics which can give problems with R idioms. R is a functional programming language, meaning that functions do not change parameters and objects, but return modified copies. In ffbase we implement functions in the R way, i.e....

To get complete cases, use this: complete_df <- df[complete.cases(df),] complete.cases returns a logical vector that tells you which rows of dataframe df are complete, and you can use that to subset the data. To replace the NAs, you can use this: new_df <- df new_df[is.na()] <- 'Unknown' But this has...

Principal == input$selectPrincipal | input$selectPrincipal == "All" ...

If you want separate bars for each gear, then you should add fill=gear to the aes in geom_bar: ggplot(cdata[cdata$year==2012 & cdata$sitecode==678490,], aes(x = factor(month), y = totalvalue, fill=gear)) + geom_bar(stat = "identity", position="dodge") + labs(x = "Month", y = "Total value") this gives: When you want to make a plot...

Creating a distance matrix between n = 133763 observations requires (n^2-n)/2 pairwise comparisons. Given that a scalar numeric requires 12 bytes of RAM the entire matrix requires about 100 GB. So unfortunately you don't have enough. Algorithms based on distance matrices scale very poorly with increased data set size (since...

Given your criteria -- that 322 is represented as 3 and 2045 is 20 -- how about dividing by 100 and then rounding towards 0 with trunc(). time_24hr <- c(1404, 322, 1945, 1005, 945) trunc(time_24hr / 100) ...

copy() is for copying data.table's. You are using it to copy a list. Try.. zz <- lapply(z,copy) zz[[1]][ , newColumn := 1 ] Using your original code, you will see that applying copy() to the list does not make a copy of the original data.table. They are still referenced by...

In the context of your code sample(rep(Schools$School.ID, each = 6)) gives a random sequence of schools where each school.id appears 6 times. Set Teachers$AssignedSchool to this sample and each teacher has an assigned school...

x<-c('AAIT', 'AAL', 'AAME') kk<-lapply(x,function(i) download.file(paste0("http://ichart.finance.yahoo.com/table.csv?s=",i),paste0(i,".csv"))) if you want to directly read the file: jj<- lapply(x,function(i) read.csv(paste0("http://ichart.finance.yahoo.com/table.csv?s=",i))) ...

Apparently you want this: A[rowSums(A != 0) == 0,] <- 1/ncol(A) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] #[1,] 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 #[2,] 1.0000000 0.0000000 1.0000000 1.0000000 0.0000000 0.0000000 0.0000000 #[3,] 0.0000000 0.0000000 0.0000000 1.0000000 1.0000000 0.0000000 0.0000000 #[4,] 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571...

Using dplyr for your first problem: left_join(contacts, listings, by = c("id" = "id")) %>% filter(abs(listing_date - contact_date) < 30) %>% group_by(id) %>% summarise(cnt = n()) %>% right_join(listings) And the output is: id cnt city listing_date 1 6174 2 A 2015-03-01 2 2175 3 B 2015-03-14 3 9176 1 B 2015-03-30...

Combining the example by @Robert and code from the answer featured here: How to get a reversed, log10 scale in ggplot2? library("scales") library(ggplot2) reverselog_trans <- function(base = exp(1)) { trans <- function(x) -log(x, base) inv <- function(x) base^(-x) trans_new(paste0("reverselog-", format(base)), trans, inv, log_breaks(base = base), domain = c(1e-100, Inf)) }...

We can use one of the aggregating functions. Using data.table, we convert the 'data.frame' to 'data.table' (setDT(input)), grouped by 'user.id', we create an 'indicator' variable by checking the elements in 'user_type' that are 'new' (user_type=='new') and at the same time meets the condition that it is the first observation ((1:.N)==1L)),...

You can loop using names of the list object and save lapply(names(mylistdf), function(x) { x1 <- mylistdf[[x]] save(x1, file=paste0(getwd(),'/', x, '.RData')) }) ...

r,string-matching,tm,agrep,qdap

I have written a function for this, not the most optimized way to do it but this will do the task. the inputs are vectors not lists, hope this helps stringMatch<-function(search.string,inputstring,pattern=" "){ stringsplit<-unlist(str_split(search.string,pattern)) firstletter<-c() for(i in seq(1,length(stringsplit))){firstletter<-paste(firstletter, substring(stringsplit[i],1,1),sep="")} search.string.l<-tolower(search.string) firstletter.l<-tolower(firstletter)...

I would create a list of all your matrices using mget and ls (and some regex expression according to the names of your matrices) and then modify them all at once using lapply and colnames<- and rownames<- replacement functions. Something among these lines l <- mget(ls(patter = "m\\d+.m")) lapply(l, function(x)...

some reproducible code would allow me to give you some example code, but in the absence of that... wrap what you currently have in another if(), checking for length = 0 (or just && it, with the NULL check first), and display your favorite placeholder message....

Try combn(v1, 2, FUN=function(x) paste(rev(x), collapse="-")) #[1] "B-A" "C-A" "D-A" "E-A" "C-B" "D-B" "E-B" "D-C" "E-C" "E-D" If you want in the default order combn(v1, 2, FUN=paste, collapse="-") #[1] "A-B" "A-C" "A-D" "A-E" "B-C" "B-D" "B-E" "C-D" "C-E" "D-E" Update For a faster option, you can use combnPrim from grBase....

r,regression,decision-tree,non-linear-regression

Simulate some data to make a reproducible example: A=data.frame(ads_return_count=sample(100,10,TRUE), actual_cpc=runif(100), is_user_agent_bot=factor(rep("False",100))) cubist(A[,c("ads_return_count","is_user_agent_bot")],A[,"actual_cpc"]) cubist code called exit with value 1 Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds Great, now we're on the same page. What bothers me is that the second argument, the outcome, is all "False". I'm not...

Change the panel.margin argument to panel.margin = unit(c(-0.5,0-0.5,0), "lines"). For some reason the top and bottom margins need to be negative to line up perfectly. Here is the result: ...

You have called the argument costs and not cost. Here's an example using the sample data in ?svm so you can try this: model <- svm(Species ~ ., data = iris, cost=.6) model$cost # [1] 0.6 model <- svm(Species ~ ., data = iris, costs=.6) model$cost # [1] 1 R...