sapply iterates through the supplied vector or list and supplies each member in turn to the function. In your case, you're getting the values 2 and 4 and then trying to index your vector again using its own values. Since the oth_let1 vector has only two members, you get NA....

ff and ffbase offer out of memory R vectors, but introduce a reference semantics which can give problems with R idioms. R is a functional programming language, meaning that functions do not change parameters and objects, but return modified copies. In ffbase we implement functions in the R way, i.e....

It is actually simple to do this with data.table. Recreating your sample data: test.data <- read.table( text = " ID COUNT SCORE VALUE ORIG_DATE CLOSE_DATE 10748 3 750 450231 2015-03-01 2015-06-01 10845 4 680 590231 2015-01-01 2015-05-01 21758 7 760 650839 2014-11-01 2015-06-01", header = TRUE, stringsAsFactors = FALSE, colClasses...

You can put your records into a data.frame and then split by the cateogies and then run the correlation for each of the categories. sapply( split(data.frame(var1, var2), categories), function(x) cor(x[[1]],x[[2]]) ) This can look prettier with the dplyr library library(dplyr) data.frame(var1=var1, var2=var2, categories=categories) %>% group_by(categories) %>% summarize(cor= cor(var1, var2)) ...

I think this code should produce the plot you want. However, without your exact dataset, I had to generate simulated data. ## Generate dummy data and load library library(ggplot2) df4 = data.frame(Remain = rep(0:1, times = 4), Day = rep(1:4, each = 2), Genotype = rep(c("wtb", "whd"), each = 4),...

You can simply use input$selectRunid like this: content(GET( "http://stats", path="gentrap/alignments", query=list(runIds=input$selectRunid, userId="dev") add_headers("X-SENTINEL-KEY"="dev"), as = "parsed")) It is probably wise to add some kind of action button and trigger download only on click....

You're almost there. As @BondedDust suggests, it's not practical to use a two-level factor (Trap) as a random effect; in fact, it doesn't seem right in principle either (the levels of Trap are not arbitrary/randomly chosen/exchangeable). When I tried a model with quadratic altitude, fixed effect of trap, and random...

input is just a reactivevalues object so you can use [[: print(input[[a]]) ...

r,regression,decision-tree,non-linear-regression

Simulate some data to make a reproducible example: A=data.frame(ads_return_count=sample(100,10,TRUE), actual_cpc=runif(100), is_user_agent_bot=factor(rep("False",100))) cubist(A[,c("ads_return_count","is_user_agent_bot")],A[,"actual_cpc"]) cubist code called exit with value 1 Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds Great, now we're on the same page. What bothers me is that the second argument, the outcome, is all "False". I'm not...

To get complete cases, use this: complete_df <- df[complete.cases(df),] complete.cases returns a logical vector that tells you which rows of dataframe df are complete, and you can use that to subset the data. To replace the NAs, you can use this: new_df <- df new_df[is.na()] <- 'Unknown' But this has...

Try library(reshape2) df1 <- transform(df, result=as.character(result), red= factor(red, levels= unique(red))) dcast(df1, mult~red, value.var='result', fill='')[-1] # 1 0.9 0.8 0.7 #1 value1 #2 value2 #3 value3 #4 value4 ...

I don't see how this is solvable using melt, but you can use a simple rbind here, for example res <- rbind(DT[, c(1,2:3), with = FALSE], DT[, c(1,4:5), with = FALSE], use.names = FALSE)[service1 != ""] res # customer service1 fee1 # 1: 1 1 100 # 2: 2 3...

r,string-matching,tm,agrep,qdap

I have written a function for this, not the most optimized way to do it but this will do the task. the inputs are vectors not lists, hope this helps stringMatch<-function(search.string,inputstring,pattern=" "){ stringsplit<-unlist(str_split(search.string,pattern)) firstletter<-c() for(i in seq(1,length(stringsplit))){firstletter<-paste(firstletter, substring(stringsplit[i],1,1),sep="")} search.string.l<-tolower(search.string) firstletter.l<-tolower(firstletter)...

You can create a similar plot in ggplot, but you will need to do some reshaping of the data first. library(reshape2) #ggplot needs a dataframe data <- as.data.frame(data) #id variable for position in matrix data$id <- 1:nrow(data) #reshape to long format plot_data <- melt(data,id.var="id") #plot ggplot(plot_data, aes(x=id,y=value,group=variable,colour=variable)) + geom_point()+ geom_line(aes(lty=variable))...

Principal == input$selectPrincipal | input$selectPrincipal == "All" ...

Using IRanges, you should use findOverlaps or mergeByOverlaps instead of countOverlaps. It, by default, doesn't return no matches though. I'll leave that to you. Instead, will show an alternate method using foverlaps() from data.table package: require(data.table) subject <- data.table(interval = paste("int", 1:4, sep=""), start = c(2,10,12,25), end = c(7,14,18,28)) query...

The problem arises from you mixture of subsetting types here: df$target[which(df$snakes=='a'),] Once you use $ the output is no longer a data.frame, and the two parameter [ subsetting is no longer valid. You are better off compacting it to: sum(df[df$snakes=="a","target"]) [1] 23 As for your model, you can just create...

It looks like you're trying to grab summary functions from each entry in a list, ignoring the elements set to -999. You can do this with something like: get_scalar <- function(name, FUN=max) { sapply(mydata[,name], function(x) if(all(x == -999)) NA else FUN(as.numeric(x[x != -999]))) } Note that I've changed your function...

python,sql,r,graph,connected-components

In R, you could use the package igraph: library(igraph) gg <- graph.edgelist(as.matrix(d), directed=F) split(V(gg)$name, clusters(gg)$membership) #$`1` #[1] "a" "b" "c" "u" "e" # #$`2` #[1] "f" "g" "h" "j" # #$`3` #[1] "z" "y" And you can look at the graph using: plot(gg) This is based on an excellent answer...

some reproducible code would allow me to give you some example code, but in the absence of that... wrap what you currently have in another if(), checking for length = 0 (or just && it, with the NULL check first), and display your favorite placeholder message....

Change the panel.margin argument to panel.margin = unit(c(-0.5,0-0.5,0), "lines"). For some reason the top and bottom margins need to be negative to line up perfectly. Here is the result: ...

Reusing the chunks in a slightly different way (thanks @george-dontas) gets me what I want. The calculated values before the R and the R with its discussion in the appendix. \documentclass{article} \begin{document} \title{For Bosses and R Experts} \author{Joe Collins} \maketitle <<*, echo=FALSE, include=FALSE>>= <<data>> <<chart>> <<statistics>> @ \section{For the Boss}...

you can try this ll[order(sapply(ll, FUN = function(x) x[1]))] [[1]] [1] "2015-01-01" "2015-01-10" [[2]] [1] "2015-02-01" "2015-02-10" [[3]] [1] "2015-03-01" "2015-03-10" and from Akrun's comment ll[order(sapply(ll, `[[`, 1))] ...

An option using data.table library(data.table) setDT(df1)[, Count:=.N, ID] # ID category Count #1: 101 A 3 #2: 101 B 3 #3: 101 C 3 #4: 102 A 1 #5: 103 B 2 #6: 103 C 2 Or using dplyr library(dplyr) df1 %>% group_by(ID) %>% mutate(Count=n()) Or using base R df1$Count...

R prefers to use i rather than j. Aslo note that complex is different than as.complex and the latter is used for conversion. You can do myStr <- "0.76+0.41j" myStr_complex <- as.complex(sub("j","i",myStr)) Im(myStr_complex) # [1] 0.41 ...

If you need the comments, you still can replace the 6th comma with a semicolon and use your previous solution: gsub("((?:[^,]*,){5}[^,]*),", "\\1;", vec1, perl=TRUE) Regex explanation: ((?:[^,]*,){5}[^,]*) - a capturing group that we will reference to as Group 1 with \\1 in the replacement pattern, matching (?:[^,]*,){5} - 5 sequences...

If I understand correctly you want to evaluate the first expression with the first value of x, the second with the second etc. You could do: mapply(function(ex, x) eval(ex, envir = list(x = x)), funs.list[1:2], c(7, 60)) ...

You can try with difftime df1$time.diff <- with(df1, difftime(time.stamp2, time.stamp1, unit='min')) df1 # time.stamp1 time.stamp2 time.diff #1 2015-01-05 15:00:00 2015-01-05 16:00:00 60 mins #2 2015-01-05 16:00:00 2015-01-05 17:00:00 60 mins #3 2015-01-05 18:00:00 2015-01-05 20:00:00 120 mins #4 2015-01-05 19:00:00 2015-01-05 20:00:00 60 mins #5 2015-01-05 20:00:00 2015-01-05 22:00:00 120...

if (length(z) %% 2) { z[-c(1, ceiling(length(z)/2), length(z))] } else z[-c(1, c(1,0) + floor(length(z)/2), length(z))] ...

As per ?zoo: Subscripting by a zoo object whose data contains logical values is undefined. So you need to wrap the subsetting in a which call: log_ret[which(!is.finite(log_ret))] <- 0 log_ret x y z s p t 2005-01-01 0.234 -0.012 0 0 0.454 0 ...

We can use one of the aggregating functions. Using data.table, we convert the 'data.frame' to 'data.table' (setDT(input)), grouped by 'user.id', we create an 'indicator' variable by checking the elements in 'user_type' that are 'new' (user_type=='new') and at the same time meets the condition that it is the first observation ((1:.N)==1L)),...

multivariate multiple regression can be done by lm(). This is very well documented, but here follows a little example: rawMat <- matrix(rnorm(200), ncol=2) noise <- matrix(rnorm(200, 0, 0.2), ncol=2) B <- matrix( 1:4, ncol=2) P <- t( B %*% t(rawMat)) + noise fit <- lm(P ~ rawMat) summary( fit )...

You can do it with rJava package. install.packages('rJava') library(rJava) .jinit() jObj=.jnew("JClass") result=.jcall(jObj,"[D","method1") Here, JClass is a Java class that should be in your ClassPath environment variable, method1 is a static method of JClass that returns double[], [D is a JNI notation for a double array. See that blog entry for...

Assuming that you want to get the rowSums of columns that have 'Windows' as column names, we subset the dataset ("sep1") using grep. Then get the rowSums(Sub1), divide by the rowSums of all the numeric columns (sep1[4:7]), multiply by 100, and assign the results to a new column ("newCol") Sub1...

The cause of the error: At the beginning of the function call, the elements of value all have class "character". But when you hit value[value=="secondary"] <- label_secondary a bunch of those elements get replaced by expressions. So when you then try to do value[value=="primary"] <- label_primary R is trying to...

You have called the argument costs and not cost. Here's an example using the sample data in ?svm so you can try this: model <- svm(Species ~ ., data = iris, cost=.6) model$cost # [1] 0.6 model <- svm(Species ~ ., data = iris, costs=.6) model$cost # [1] 1 R...

Two small changes: mvad_long$id <- as.factor(mvad_long$id) ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+ geom_tile()+facet_wrap(~cluster,scales = "free_y") ggplot was treating id as a numerical variable, rather than a factor, and then the scales were fixed....

r,if-statement,recursion,vector,integer

Your sapply call is applying fun across all values of x, when you really want it to be applying across all values of i. To get the sapply to do what I assume you want to do, you can do the following: sapply(X = 1:length(x), FUN = fun, x =...

It's easier to think of it in terms of the two exposures that aren't used, rather than the five that are. Let's limit the number of times an exposure can be excluded: draw_exc <- function(exposures,nexp,ng,max_excluded = 10){ nexc <- length(exposures)-nexp exp_rem <- exposures exc <- matrix(,ng,nexc) for (i in 1:ng){...

If you're willing to give xml2 a go, you can get to begin in a few lines: library(xml2) library(magrittr) # get a vector doc <- read_xml("~/Dropbox/Data.xml") doc %>% xml_find_all("//d1:event/d1:begin", ns=xml_ns(doc)) %>% xml_text() %>% as.numeric() ## [1] 0.24 0.73 1.25 1.75 2.24 2.75 3.27 3.76 4.30 4.77 5.28 5.78 6.32 6.82...

r,optimization,circular,maximization

I would compute all the pairs of rows in df: (pairs <- cbind(1:nrow(df), c(2:nrow(df), 1))) # [,1] [,2] # [1,] 1 2 # [2,] 2 3 # [3,] 3 4 # [4,] 4 5 # [5,] 5 6 # [6,] 6 1 You can find the best pairing with which.max:...

Try indx <- as.numeric(sub('.*g', '', dat_name[,1])) data1 <- ex.data.frame data1[] <- lapply(ex.data.frame, function(x) dat_name[,1][match(x, indx)]) data1 # c1 c2 c3 #1 At5g003 At5g002 At5g001 #2 At5g004 At5g005 At5g002 #3 At5g001 <NA> At5g003 #4 <NA> <NA> At5g004 #5 <NA> <NA> At5g005 EDIT If the strings as random, you could do indx...

r,mathematical-optimization,quadprog,quadratic-programming

You can do this with the solve.QP function from quadprog. From ?solve.QP, we read that solve.QP solves systems of the form min_b {-d'b + 0.5 b'Db | A'b >= b0}. You are solving a problem of the form min_w {-A'w + pw'Cw | w >= 0, 1'w = 1}. Thus,...

If you read on the R help page for as.Date by typing ?as.Date you will see there is a default format assumed if you do not specify. So to specify for your data you would do nmmaps$date <- as.Date(nmmaps$date, format="%m/%d/%Y") ...

Set renderer to canvas in set_options: library(ggvis) mtcars %>% ggvis(~wt, ~mpg) %>% layer_points() %>% set_options(width = 300, height = 200, padding = padding(10, 10, 10, 10), renderer = "canvas") ...

Or you could place a rectangle on the region of interest: rect(xleft=1994,xright = 1998,ybottom=range(CVD$cvd)[1],ytop=range(CVD$cvd)[2], density=10, col = "blue") ...

In the link that I mentioned in the comment, you can find solutions using RCurl and httr package. Here, I provide the solution using rvest package. library(rvest) kk<-html("http://en.wikipedia.org/wiki/List_of_S%26P_500_companies")%>% html_table(fill=TRUE)%>% .[[1]] //table 1 only head(kk) Ticker symbol Security SEC filings GICS Sector GICS Sub Industry Address of Headquarters 1 MMM 3M...

Using data.table library(data.table) setDT(df1)[, list(pages=paste(page, collapse="_")), list(user_id, date=as.Date(date, '%m/%d/%Y'))] Or using dplyr library(dplyr) df1 %>% group_by(user_id, date=as.Date(date, '%m/%d/%Y')) %>% summarise(pages=paste(page, collapse='_')) ...

r,data.table,stata,code-translation

Your intuition is correct. collapse is the Stata equivalent of R's aggregate function, which produces a new dataset from an input dataset by applying an aggregating function (or multiple aggregating functions, one per variable) to every variable in a dataset.

Assuming your workingNational data doesn't have gaps or other irregularities, you could look up the location of each ad time in workingNational and then just take the five entries leading up to that time: indices <- match(tvNationalSale$Ad.Time, workingNational$datetime) tvNationalSale$fiveMinutesBefore <- rowSums(sapply(1:5, function(x) workingNational$sessions[indices-x])) head(tvNationalSale) # Ad.Time fiveMinutesBefore # 1 2015-01-03...

If you only have 4 GBs of RAM you cannot put 5 GBs of data 'into R'. You can alternatively look at the 'Large memory and out-of-memory data' section of the High Perfomance Computing task view in R. Packages designed for out-of-memory processes such as ff may help you. Otherwise...

If you melt your data.table to long format, this is easy: library(reshape2) news1 <- melt(news, id.vars = "ID") news2 <- news1[abs(value) > 0.01,] # ID variable value #1: 8 diff.jan 0.101 #2: 202 diff.apr 10.000 #3: 203 diff.apr 11.000 #4: 50 diff.aug 0.221 dcast.data.table(news2, ID ~ variable) # ID diff.jan...

I would create a list of all your matrices using mget and ls (and some regex expression according to the names of your matrices) and then modify them all at once using lapply and colnames<- and rownames<- replacement functions. Something among these lines l <- mget(ls(patter = "m\\d+.m")) lapply(l, function(x)...

Here is some sample code based on what you had in your original problem which will aggregate Twitter results for a set of users: # create a data frame with 4 columns and no rows initially df_result <- data.frame(t(rep(NA, 4))) names(df_result) <- c('id', 'name', 's_name', 'fol_count') df_result <- df_result[0:0,] #...

r,arguments,string-matching,agrep

From the help file: If ‘cost’ is not given, ‘all’ defaults to 10%, and the other transformation number bounds default to ‘all’. As far as I understand it means that either cost or all is a limiting factor even if you set del, ins and sub. If you want to...

This will give what you want: Foo <- function(x){ if (length(ll[[x]])>1) { print(x) x <- mean(ll[[x]]) } else { x <- ll[[x]] } } ll_new <- lapply(setNames(names(ll), names(ll)), Foo) # [1] "B" ll_new # $A # [1] 10 # # $B # [1] 25 # # $C # [1] 40...

Use GetFitARpMLE(z,4) You will get > GetFitARpMLE(z,4) $loglikelihood [1] -2350.516 $phiHat ar1 ar2 ar3 ar4 0.0000000 0.0000000 0.0000000 -0.9262513 $constantTerm [1] 0.05388392 ...

You can do something like this: print_test<-function(x) { Sys.sleep(x) cat("hello world") } print_test(15) If you want to execute it for a certain amount of iterations use to incorporate a 'for loop' in your function with the number of iterations....

I'm going with the assumption you meant "to the right" since you said "Another solution might be to drawn a polygon around the Baltic Sea and only to select the points within this polygon" # your sample data pts <- read.table(text="lat long 59.979687 29.706236 60.136177 28.148186 59.331383 22.376234 57.699154 11.667305...

A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above): import <- lapply(files, read.csv, header=FALSE) Then if you want to operate on each data.frame in the list...

Do not use the dates in your plot, use a numeric sequence as x axis. You can use the dates as labels. Try something like this: y=GED$Mfg.Shipments.Total..USA. n=length(y) model_a1 <- auto.arima(y) plot(x=1:n,y,xaxt="n",xlab="") axis(1,at=seq(1,n,length.out=20),labels=index(y)[seq(1,n,length.out=20)], las=2,cex.axis=.5) lines(fitted(model_a1), col = 2) The result depending on your data will be something similar: ...

x<-c('AAIT', 'AAL', 'AAME') kk<-lapply(x,function(i) download.file(paste0("http://ichart.finance.yahoo.com/table.csv?s=",i),paste0(i,".csv"))) if you want to directly read the file: jj<- lapply(x,function(i) read.csv(paste0("http://ichart.finance.yahoo.com/table.csv?s=",i))) ...

You can use findInterval in combination with by; by(df,findInterval(df$Y,quantile(df$Y,c(0.25,0.5,0.75))),estFun) ...

Try split(vec,cumsum(c(1, abs(diff(vec))))) #$`1` #[1] 1 1 1 1 1 1 #$`2` #[1] 0 0 0 0 0 0 0 0 0 0 #$`3` #[1] 1 1 1 1 1 1 1 1 1 1 1 #$`4` #[1] 0 0 0 0 Or use rle split(vec,inverse.rle(within.list(rle(vec), values <- seq_along(values)))) If...

You can try library(data.table)#v1.9.4+ setDT(yourdf)[, .N, by = A] ...

You could loop through the rows of your data, returning the column names where the data is set with an appropriate number of NA values padded at the end: `colnames<-`(t(apply(dat == 1, 1, function(x) c(colnames(dat)[x], rep(NA, 4-sum(x))))), paste("Impair", 1:4)) # Impair1 Impair2 Impair3 Impair4 # 1 "A" NA NA NA...

There is nothing wrong with the >=, your problem is that 1 is not really one. Try this Ax >= 1 [1] FALSE Ax == 1 [1] FALSE and format(Ax, digits = 20) [1] "0.99999999999999977796" Edit: A possible Solution As solutions to your problems you can return the final result...

You can get the values with get or mget (for multiple objects) lst <- mget(myvector) lapply(seq_along(lst), function(i) write.csv(lst[[i]], file=paste(myvector[i], '.csv', sep='')) ...

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

Use addNA to treat NA as a distinct level of x. > temp.df$x <- addNA(temp.df$x) > aggregate(count ~ x + y, data=temp.df, FUN=sum, na.rm=FALSE, na.action=na.pass) x y count 1 1 A 2 2 <NA> A 2 3 3 B 1 4 10 B 1 ...

In linux, you could use awk with fread or it can be piped with read.table. Here, I changed the delimiter to , using awk pth <- '/home/akrun/file.txt' #change it to your path v1 <- sprintf("awk '/^(ID_REF|LMN)/{ matched = 1} matched {$1=$1; print}' OFS=\",\" %s", pth) and read with fread library(data.table)...

Using dplyr for your first problem: left_join(contacts, listings, by = c("id" = "id")) %>% filter(abs(listing_date - contact_date) < 30) %>% group_by(id) %>% summarise(cnt = n()) %>% right_join(listings) And the output is: id cnt city listing_date 1 6174 2 A 2015-03-01 2 2175 3 B 2015-03-14 3 9176 1 B 2015-03-30...

Combining the example by @Robert and code from the answer featured here: How to get a reversed, log10 scale in ggplot2? library("scales") library(ggplot2) reverselog_trans <- function(base = exp(1)) { trans <- function(x) -log(x, base) inv <- function(x) base^(-x) trans_new(paste0("reverselog-", format(base)), trans, inv, log_breaks(base = base), domain = c(1e-100, Inf)) }...

You can loop using names of the list object and save lapply(names(mylistdf), function(x) { x1 <- mylistdf[[x]] save(x1, file=paste0(getwd(),'/', x, '.RData')) }) ...

Try library(stringr) str_extract(word, '.*(?=\\.csv)') #[1] "dirtyboards" Another option which works for the example provided (and not very specific) str_extract(word, '^[^.]+') #[1] "dirtyboards" Update Including 'foo.csv.csv', word1 <- c("dirtyboards.csv" , "boardcsv.csv", "foo.csv.csv") str_extract(word1, '.*(?=\\.csv$)') #[1] "dirtyboards" "boardcsv" "foo.csv" ...

This should get you headed in the right direction, but be sure to check out the examples pointed out by @Jaap in the comments. library(ggmap) map <- get_map(location = "Mumbai", zoom = 12) df <- data.frame(location = c("Airoli", "Andheri East", "Andheri West", "Arya Nagar", "Asalfa", "Bandra East", "Bandra West"), values...

Given your criteria -- that 322 is represented as 3 and 2045 is 20 -- how about dividing by 100 and then rounding towards 0 with trunc(). time_24hr <- c(1404, 322, 1945, 1005, 945) trunc(time_24hr / 100) ...

r,function,optimization,mathematical-optimization

I think you want to minimize the square of a-fptotal ... ff <- function(x) myfun(x)^2 > optimize(ff,lower=0,upper=30000) $minimum [1] 28356.39 $objective [1] 1.323489e-23 Or find the root (i.e. where myfun(x)==0): uniroot(myfun,interval=c(0,30000)) $root [1] 28356.39 $f.root [1] 1.482476e-08 $iter [1] 4 $init.it [1] NA $estim.prec [1] 6.103517e-05 ...

Creating a distance matrix between n = 133763 observations requires (n^2-n)/2 pairwise comparisons. Given that a scalar numeric requires 12 bytes of RAM the entire matrix requires about 100 GB. So unfortunately you don't have enough. Algorithms based on distance matrices scale very poorly with increased data set size (since...

It's generally not a good idea to try to add rows one-at-a-time to a data.frame. it's better to generate all the column data at once and then throw it into a data.frame. For your specific example, the ifelse() function can help list<-c(10,20,5) data.frame(x=list, y=ifelse(list<8, "Greater","Less")) ...

Try combn(v1, 2, FUN=function(x) paste(rev(x), collapse="-")) #[1] "B-A" "C-A" "D-A" "E-A" "C-B" "D-B" "E-B" "D-C" "E-C" "E-D" If you want in the default order combn(v1, 2, FUN=paste, collapse="-") #[1] "A-B" "A-C" "A-D" "A-E" "B-C" "B-D" "B-E" "C-D" "C-E" "D-E" Update For a faster option, you can use combnPrim from grBase....

Here's a recommended way to ask a question, focusing on the fact that your actual data is too big, too complicated, or too private to share. Question: how to apply a function on each row of a data.frame? My data: # make up some data s <- "Lorem ipsum dolor...

r,nested,time-series,lapply,sapply

Using plyr: As a matrix (time in cols, rows corresponding to rows of df): aaply(df, 1, function(x) weisurv(t, x$sc, x$shp), .expand = FALSE) As a list: alply(df, 1, function(x) weisurv(t, x$sc, x$shp)) As a data frame (structure as per matrix above): adply(df, 1, function(x) setNames(weisurv(t, x$sc, x$shp), t)) As a...

r,string-split,stemming,text-analysis

Given a list of English words you can do this pretty simply by looking up every possible split of the word in the list. I'll use the first Google hit I found for my word list, which contains about 70k lower-case words: wl <- read.table("http://www-personal.umich.edu/~jlawler/wordlist")$V1 check.word <- function(x, wl) {...

In the context of your code sample(rep(Schools$School.ID, each = 6)) gives a random sequence of schools where each school.id appears 6 times. Set Teachers$AssignedSchool to this sample and each teacher has an assigned school...

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

copy() is for copying data.table's. You are using it to copy a list. Try.. zz <- lapply(z,copy) zz[[1]][ , newColumn := 1 ] Using your original code, you will see that applying copy() to the list does not make a copy of the original data.table. They are still referenced by...

Here's a solution for extracting the article lines only. Turned out much more complex and cryptic than I'd been hoping, but I'm pretty sure it works. Also, thanks to akrun for the test data. v1 <- c('ard','b','','','','rr','','fr','','','','','gh','d'); ind <-...