We can remove the 'NA' elements with !is.na(x), but the lag(x) will return NA as the first element, which can be removed by using na.rm=TRUE in the sd volcalc= function (x) { x <- x[!is.na(x)] returns=log(x)-log(lag(x)) vol=sd(returns, na.rm=TRUE)*sqrt(252) return(vol) } apply(dataexample, 2, volcalc) # x y #3.012588 1.030484 ...
First off, I would like to recommend not to use the combination data.frame(cbind(...)). Here's why: cbind creates a matrix by default if you only pass atomic vectors to it. And matrices in R can only have one type of data (think of matrices as a vector with dimension attribute, i.e....
Using your code, we need to replicate the 'repl' column to make the two subset datasets equal and then assign the values as you did val <- df2$repl[match(df1$subj, df2$subj)][row(df1[mycols])][is.na(df1[mycols])] df1[mycols][is.na(df1[mycols])] <- val df1 # subj a b c #1 1 5 5 1 #2 1 2 3 5 #3 2...
Try ave. It applies a function to groups. Have a look at ?ave for details, e.g.: df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)])) # person_id hhold_no med_card med_card_new #1 1 1 1 1 #2 2 1 1 1 #3 3 1 NA 1 #4 4 1 NA 1 #5 5 1 NA...
# Setup problem import pandas as pd import numpy as np num_samples = 100 s = pd.Series(np.random.randint(0, 500, num_samples), index=pd.date_range('03/06/2015', periods=num_samples, freq='10min')) mask = np.random.rand(num_samples) < .7 s[mask] = np.nan # Loop through index # Note the perc_nan variable can be changed depending on what percentage of the interval must...
Given a data frame df, complete.cases(df) returns a vector of true or false values. You can use this vector as an index of df to extract a subset of it that has complete cases, like this: df[complete.cases(df),] The number of complete cases, or nobs value as you write in your...
My preferred solution would be to reshape this to long format. Then you only need 1 geom_line call. Especially if you have many series, that's tidier. Same result as LyzandeR's 2nd chart. library(ggplot2) library(reshape2) test2 <- melt(test, id.var='YEAR') test2 <- na.omit(test2) ggplot(test2, aes(x=YEAR, y=value, color=variable)) + geom_line() + scale_color_manual(values=c('red', 'green'))...
You can create a logical index to subset columns other than the 'Date' class, and use that to replace the '' with NA indx <- sapply(data, class)!='Date' data[indx][data[indx]==''] <- NA It is the 'Date' class that is creating the problem. Another option would be to convert the data to matrix...
The format argument you give should reflect the format that the data is currently in, not the format that you want to convert it to, so you would have to set format = "%Y-%m-%d". Read the documentation on strptime again and it should make sense.
This short example illustrates one of possible ways of introducing a new level into a factor: x <- factor(c(NA, NA, "a", "b", NA, "b")) x[is.na(x)] <- "c" # this won't work, no such level as "c" in levels(x) ## Warning message: ## In `[<-.factor`(`*tmp*`, is.na(x), value = "c") : ##...
Well, I would distinguish between the cases of NA/NaN/Infinity and the rest. I would certainly not convert them to zero as this would distort the result significantly while at the same time, not having any real mathematical sense. If a value is NA, then it is not, as the name...
Finally I realized my own vectorized version. It returns expected output: na.replace <- function(x, k) { isNA <- is.na(x[, k]) x[isNA, ] <- na.locf(x[, k], na.rm = F)[isNA] x } ...
Here's a possible solution is.na(j) <- j == FALSE df[] <- df[as.matrix(j)] df # X12h X13h X14h X15h X16h # 00003 NA NA NA NA NA # 00017 NA NA NA 18 NA # 00018 NA NA 25 NA NA # 00021 20 NA NA NA 7 # 00025 NA...
While you don't say what your timezone is, this looks like Daylight Saving Time (DST) issue. In timezones that use DST, there will be a day where the hour "jumps" from 1:59:59.999 to 3:00:00.000. This means that any times in the 2AM hour do not exist on this day. My...
pmax is not designed to be used with data.frame input. The error is introduced in line 35 of pmax: mmm[change] <- each[change] because each is defined to be as long as the length of the input, which for a data.frame is the number of columns. Therefore when it tries to...
There is a drop argument in dcast that you can use in this situation, but you'll need to convert subject to a factor. Here is a dataset with a second subject ID that has no values that meet your condition that the absolute value of z.score is less than one....
This is all about writing a modified na.locf function. After that you can plug it into data.table like any other function. new.locf <- function(x){ # might want to think about the end of this loop # this works here but you might need to add another case # if there...
The right way to do iterative code in R is to avoid explicit for loops. Use apply (and the company) instead. @jeremycg gave you the right R-ish answer. Regarding your code, you should make some editing to make it work. temp <- c() for (i in 1:length(data)){ temp[names(data)[i]] <- sum(is.na(data[i]))...
library(DataCombine) DropNA(dframe1, Var = "Sex", message = F) In your code I see two possible mistakes: 1) You didn't close the square bracket; 2) it should be dframe1$Sex, not dframe1$sex (remember that R is case-sensitive)....
r,row,interpolation,na,missing-data
You could also use the na.approx function from the zoo package. Note that this has a slightly different behavior (than the solution by @flodel) when you have two consecutive NA values. For the first and last row you could then use na.locf. y <- na.approx(x) y[nrow(y), ] <- na.locf(y[(nrow(y)-1):nrow(y), ])[2,...
Try library(data.table)#v >= 1.9.5 (devel version - install from GitHub). #library(devtools) #install_github("Rdatatable/data.table", build_vignettes = FALSE) as.data.table(example)[, res:=(NA | (min(example)< -1))*example, by=rleid(is.na(example))][, res] ...
If only one value is not NA amongst IMILEFT and IMIRIGHT (as in your example), just try (df is your data.frame): indx<-is.na(df$IMIAVG) df$IMIAVG[indx]<-rowSums(df[indx,1:2],na.rm=TRUE) Btw, if you want to find the mean value of each row and exclude the NA values in the process, you can set the na.rm argument as...
Yes, the return types of == and %in% are different with respect to NA because of how "%in%" is defined... # Data... x <- c("A",NA,"A") # When NA is encountered NA is returned # Philosophically correct - who knows if the # missing value at NA is equal to "A"?!...
The output of kmeans corresponds to the elements of the object passed as argument x. In your case, you omit the NA elements, and so $cluster indicates the cluster that each element of na.omit(x) belongs to. Here's a simple example: d <- data.frame(x=runif(100), cluster=NA) d$x[sample(100, 10)] <- NA clus <-...
sql,sql-server,group-by,sum,na
You are doing the numeric logic after the sum(). You need to do it inside the sum(): select e.[ ACCOUNT_NUM] as acct_num, 'MTL' as src sum(case when isnumeric([ TRANSACTION_AMT]) = 1 then ABS([ TRANSACTION_AMT]) else 0 end) as abs_total_txn_amt from mtb..MTEL e (nolock) group by e.[ ACCOUNT_NUM]; And, you probably...
I don't think you can do this without a loop. dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA))) dat[3,2] <- NA # V1 V2 V3 V4 V5 V6 V7 V8 # 1 NA NA 1 3 5 NA NA NA # 2 NA 1 2 3 6 7 8 NA # 3 1...
Try switching the columns in TRACKS around. VLOOKUP bases it's lookup on the first column, so in your case, it's looking through column A (1, 2, 3, etc.) If you want your VLOOKUP to be based on the text, it needs to be in A instead. i.e. | A |...
Change the first data.frame to long format, then it's easy. df1 is A and df2 is B. I also name the numbers id. require(tidyr) # wide to long (your example D) df1tidy <- gather(df1,addname,addval,-id) # don't need the original add* vars or NA's df1tidy$addname <- NULL df1tidy <- df1tidy[!is.na(df1tidy$addval), ]...
I believe that, in such clauses, Excel gives precedence to the artificial expansion of the reference to match that of the worksheet range selected (which it will always do by filling with #N/As) over first resolving the IF clause over the array. So, whereas "normally" (e.g in a single-cell array...
You have three different conditions, so it's most natural to express it in three lines: z <- rep(0,nrow(frame)) z[apply(is.na(frame),1,all)] <- NA z[apply(frame==1 ,1,any)] <- 1 # [1] NA 0 1 ...
I think this is because in the mean(dat$x, na.rm=T) version, each NA that is removed, reduces the number of observations by 1, whereas if you aggregate first, in your example you have an NA in row 10 (ID 11) which is removed but since the other rows with ID 11...
Use ggplot: library(ggplot2) ggplot(students) + geom_boxplot(aes(x = success, y = WAM)) ...
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5)) For the whole dataframe: sum(is.na(x))/prod(dim(x)) Or mean(is.na(x)) For columns: apply(x, 2, function(col)sum(is.na(col))/length(col)) Or colMeans(is.na(x)) ...
r,function,matrix,percentage,na
I was over thinking it. Below is my new function. a<-A[!is.na(as.vector(A))] # A, B, C, D are matrices with NA values b<-B[!is.na(as.vector(B))] c<-C[!is.na(as.vector(C))] d<-D[!is.na(as.vector(D))] mm<-function(x){mean(x==1)} # make percentage function Item.1<-lapply(data.frame(a,b,c,d), mm) # apply function to matrices ...
If you insist on 1% daily return, you could do it in log scale, without touching Inf. set.seed(2013) P_1 <- 100 # Initial price of stock r <- rnorm(100000, .01, .05) # Generating 100,000 instances logP <- log(P_1) + cumsum(log(1+r)) dlogP <-diff(logP) # The change in logs from t+1 and...
This may have been answered before, but I don't know if it's been answered in a dplyr context. zoo::na.locf() is your friend: m %>% group_by(y1) %>% mutate(y4=zoo::na.locf(y3)) ...
Since I don't have an example df check if this works for you: do.call("cbind", lapply(err, function(x) if(min(x, na.rm=T) > -20 & max(x, na.rm=T) < 20) return(x) )) ...
look at gsub census$x <- gsub("?",NA,census$x, fixed = TRUE) edit: forgot to add fixed = TRUE As Richard pointed out, this will catch all occurrences of a ?...
You can try f1 <- function(x) if(all(is.na(x))) NA_integer_ else sum(x, na.rm=TRUE) aggregate(.~name, test, FUN=f1, na.action=NULL) Or library(dplyr) test %>% group_by(name) %>% summarise_each(funs(f1)) Or library(data.table) setDT(test)[, lapply(.SD, f1), name] ...
Here's a somewhat vectorized approach with base R start <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "first") + 1L) end <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "last") + 1L) maxval <- do.call(pmax, c(df[-1L], na.rm = TRUE)) cbind(df[1L], start = df[start], end = df[end], maxvalue = maxval) # Person start end maxvalue # 1...
I'd use this: DF <- read.table(text = "Dom.Supply Feed Seed Waste Processing Other.Uses Food 9 NA 1 NA NA NA 7 7 NA 1 NA NA NA 5 9 NA 2 NA NA NA 7 9 NA 2 NA NA NA 7 16 NA 2 NA NA NA 14", header...
There's two questions here: 1) How do I get the mean of a set of numbers excluding NA? Mean = mean(df[, 7], na.rm = TRUE) 2) How do I replace NA with a specified value in a column? df[,7][is.na(df[,7])] <- Mean ...
r,parallel-processing,apply,jobs,na
I got the answer. When we call big.mat we should use [,] so here's the partial answer. > colMeans(is.na(big.mat[,])) Year Month DayofMonth DayOfWeek 0.00000000 0.00000000 0.00000000 0.00000000 DepTime CRSDepTime ArrTime CRSArrTime 0.02102102 0.00000000 0.02402402 0.00000000 UniqueCarrier FlightNum TailNum ActualElapsedTime 1.00000000 0.00000000 0.97997998 0.02402402 CRSElapsedTime AirTime ArrDelay DepDelay 0.00000000 0.02402402 0.02402402...
Here's a little one-line function that'll work, in case you don't want to load another package: rollForward <- function(x) { c(NA, x[!is.na(x)])[cumsum(!is.na(x)) + 1] } test[,"class"] <- rollForward(test[,"class"]) test # ID class # [1,] 1 3 # [2,] 2 4 # [3,] 3 3 # [4,] NA 3 # [5,]...
is.na is a function, you should use which(!is.na(live[x,])) ...
add two arguments when reading the file: na.strings = "*****", stringsAsFactors = FALSE...
Try the following: # use your criteria to determine what the incorrect values are in each column wrongs = lapply(dt_[, !"Cat", with = F], function(x) which(is.na(as.numeric(x)))) # now substitute for (n in names(wrongs)) dt_[wrongs[[n]], (n) := originTable[[n]][wrongs[[n]]]] dt_ # Cat Jan Feb Mar Apr May #1: A 1 2 5...
It should work with column indices. I tested with this fake file: test = structure(list(testA1 = 1:5, testA2 = 1:5, testA3 = 1:5, c(NA, NA, NA, NA, NA), c(NA, NA, NA, NA, NA), c(NA, NA, NA, NA, NA), testB1 = 1:5, testB2 = 1:5, c(NA, NA, NA, NA, NA), c(NA,...
r,function,search,data.table,na
All credit for this goes to @Arun. testDT[testDT[, .I[Data[Bar == "Gray"] %between% c(0.55, 0.65)], list(Zone, SampleID, Color)]$V1] ...
Try sapply(lst, function(x) any(colSums(!is.na(x))==0)) #[1] TRUE FALSE TRUE Update If you want to check for a particular column, for e.g. column 2 sapply(lst, function(x) all(is.na(x[,2]))) #[1] FALSE FALSE TRUE Or sapply(lst, function(x) sum(!is.na(x[,2]))==0) #[1] FALSE FALSE TRUE data df <- data.frame(col1= NA, col2=1:5, col3=c(1:3,NA, NA)) df1 <- data.frame(col1=1:5, col2=6:10, col3=11:15)...
For Excel 2010 or later: =IFERROR(LOOKUP(2,1/(C1=AGGREGATE({14,15},6,$C$1:$C$12,1)),{"MAX","MIN"}),"") Regards...
You can use this solution: > t(apply(d[-1],1,function(rw) rw[range(which(!is.na(rw)))])) [,1] [,2] [1,] 62 59 [2,] 49 60 [3,] 59 34 where d is your data set. How it works: for each row of d (rows are scanned using apply(d[-1],1,...), where d[-1] excludes the first column), get the indices of non-NA test...
Try library(dplyr) DF %>% group_by(ID) %>% summarise_each(funs(sum(., na.rm=TRUE))) ...
It seems like your U column should be 2 corresponding to "B", not 1. Please clarify that. You could try match() matrix(match(demodata2, LETTERS), nrow(demodata2), dimnames=dimnames(demodata2)) # Q R S T U W # [1,] 1 2 NA NA 2 2 # [2,] 2 2 2 NA 2 2 # [3,]...
I'm not sure about what you mean by a whole file/complex function, but depending on the data type you're storing the file with, it's pretty easy using is.na(): df <- data.frame(A = rep(1, 5), B = rep(1,5)) df$B[1] <- NA df$A[3] <- NA df[is.na(df)] <- 0 ...
Currently there is no NA value available in Pandas or NumPy. From the section "Working with missing data" in the Pandas manual (http://pandas.pydata.org/pandas-docs/stable/missing_data.html): The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example, scikits.timeseries....
I would something like this : idx <- which(dat1>2.5 & dat2>2.5,arr.ind=TRUE) cbind(idx,v1=dat1[idx],v2=dat2[idx]) reprodicible example: set.seed(1) dat1 <- as.data.frame(matrix(runif(12,1,5),ncol=3)) dat2 <- as.data.frame(matrix(runif(12,1,5),ncol=3)) idx <- which(dat1>2.5 & dat2>2.5,arr.ind=TRUE) cbind(idx,v1=dat1[idx],v2=dat2[idx]) # row col v1 v2 # [1,] 3 1 3.291413 4.079366 # [2,] 4 1 4.632831 2.990797 # [3,] 2 2 4.593559 4.967624...
You need to do the following: #since you don't care about the Infs convert them to NAs #so that they get removed at the mean function #since we have set na.rm=TRUE df$amihud[df$amihud==Inf] <- NA library(dplyr) #you need to use summarise to calculate the means as below: res <- df %>%...
How about this - it works for a small sample data: Your input data: df <- read.table(header=T, text='count time 47 "15/12/2014 06:30" 3 "15/12/2014 06:31" 431 "15/12/2014 06:34" 320 "15/12/2014 06:35" 42 "15/12/2014 06:36" 13 "15/12/2014 06:37" 383 "15/12/2014 06:38" 160 "15/12/2014 06:39"') Format the "time" column: df$time <- as.POSIXct(df$time,...
As the others already stated in the comments, your "NAs introduced by coercion" is not reproducible. But let me just give you a hint on how to make the code more "scalable" and readable: x <- c(1890, 1899,1900,2001,2012,1999,1943,1944,1950,1988,1981,1988,1997,2014) brk <- seq(1890, 2020, by=10) # breaks cut(x, breaks=brk, right=FALSE, labels=paste(brk[-length(brk)], "s",...
r,data.frame,apply,na,missing-data
x <- sample.df[ lapply( sample.df, function(x) sum(is.na(x)) / length(x) ) < 0.1 ] ...
This is one solution (brute force) with just one missing value: prac <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403), b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401)) NA.index <- which(abs(prac$b[1:length(prac$a)] - prac$a) > 0.05) newlist.a <- c(prac$a[1:NA.index-1], NA, prac$a[NA.index]) this here should be generizeable( depending on how your data actually is structured): prac <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403), b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401)) for(i in seq_along(prac$a))...
You can initialize out with NA values: #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector fill_backward(NumericVector x) { int n = x.size(); NumericVector out = NumericVector(n, NumericVector::get_na()); for (int i = 0; i < n; ++i) { if (R_IsNA(x[i])) { for (int j = i+1; j < n; ++j) {...
It's not clear from your example, but I think the problem is you're handling NA's incorrectly and\or using wrong type for data.frame's columns. Try rewriting your code like that: #if your columns are of character type (warnings are ok) child$G4_R_2_4<-as.numeric(child$G4_R_2_4) child$G4_R_2_5<-as.numeric(child$G4_R_2_5) child$G4_R_2_5_option2<-as.numeric(child$G4_R_2_5_option2) #correct NA handling child$kg<-ifelse(is.na(child$G4_R_2_4) & child$G4_R_2_5 < child$G4_R_2_5_option2,...
You can try mydata[!rowSums(is.na(mydata[,c('x1', 'x2')])),] # y x1 x2 x3 #1 1 1 1 1 #3 1 1 1 1 #5 1 4 1 1 #6 2 5 1 1 #7 2 1 8 8 #9 2 2 2 2 #10 3 5 2 NA #11 3 2 4 4...
Looks like you have an actual NA in your names, instead of "NA". The former represents a missing value, the latter is a character string that looks like the symbol that represents the missing value. Use: df <- df[!is.na(names(df))] Illustrating with iris: > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1...
is.finite and is.infinite don't have a data.frame or a data.table methods like is.na has (compare methods(is.infinite) vs methods(is.na)) You could alternatively loop thru the columns and then use colSums DT[, colSums(sapply(.SD, is.infinite))] # a b c # 2 0 1 Alternatively, you could use Reduce instead of colSums DT[, Reduce(`+`,...
Another option is library(dplyr) df %>% mutate_each(funs(median=.-median(., na.rm=TRUE)), -ID) ...
You just need an in/else in your function: rankall <- function(rank) { split_by_state <- split(df, df$state) ranked_hospitals <- lapply(split_by_state, function (x) { indx <- x$rank==rank if(any(indx)){ return(x[indx, ]) else{ out = x[1, ] out$hospital = NA return(out) } } } ...
The "[<-" function can create (assign) new columns by name: > dat[ , paste0( "M_",names(dat)[-1])] <- lapply(dat[-1], function(x) as.numeric(is.na(x)) ) > dat INDEX HEIGHT LENGTH M_HEIGHT M_LENGTH 1 1 70 55 0 0 2 2 60 NA 0 1 3 3 NA 35 1 0 4 4 NA NA 1...
It seems I will be able to handle it using Amelia package: http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf#subsection.4.4 http://cran.r-project.org/web/packages/Amelia/Amelia.pdf and true, there is lots of materials on it on Cross Validated, e.g. http://stats.stackexchange.com/questions/95832/missing-values-nas-in-the-test-data-when-using-predict-lm-in-r @nograpes, thank you for all the hints!...
It looks like dplyr can't handle access newly assigned lag values. Here is a solution that should work even if the NA's are in the middle of a column. df <- apply( df, 2, function(x){ if(sum(is.na(x)) == 0){return(x)} ## updated with optimized portion from @josilber r <- rle(is.na(x)) na.loc <-...
Try df <- within(df, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA)) df # names gender mark #1 John M 1 #2 Mark M 2 #3 Larry M 3 #4 Will M 1 #5 Kate F NA #6 Daria F NA #7 Tom M NA Or df$mark[df$names %in% c('Kate',...
You can use complete.cases to get a logical vector of complete rows (TRUE = complete); then subsetting inside ad-hoc function used for testing too library(gtools) df <- data.frame(temp=rnorm(100, 10:30), prec=rnorm(100, 1:300), humi=rnorm(100, 1:100)) df$prec[c(1:10, 25:30, 95:100)] <-NA df$humi[c(15:19, 20:25, 80:90)] <-NA my.fun <- function(x,y) { my.df <- data.frame(x,y) my.df.cmpl <-...
Try library(sp) [email protected][,'roadtype'][[email protected][,'jointcount']!=1] <- NA S # SpatialPoints: # jointcount roadtype #[1,] 1 3 #[2,] 4 NA #[3,] 3 NA #[4,] 1 1 #[5,] 1 4 data jointcount = c(1,4,3,1,1) roadtype = c(3,2,5,1,4) S <- SpatialPoints(data.frame(jointcount,roadtype)) ...
subset(dat, !duplicated(C) & !duplicated(C, fromLast=T) & is.na(B)) # A B C # 4 R NA pink # 6 V NA yellow ...
You can try either library(data.table) na.omit(RRR[, lapply(.SD, function(x) replace(x, which(x==0), NA))]) Or using set for(j in 1:ncol(RRR)){ set(RRR, i=which(RRR[[j]]==0), j=j, value=NA) } na.omit(RRR) Benchmarks set.seed(1) n <- 1000000 RRR <- data.table(matrix(rgeom(100*n,0.5), ncol=100)) RRR1 <- copy(RRR) RRR2 <- copy(RRR) RRR3 <- copy(RRR) system.time({RRR[(RRR==0)] <- NA na.omit(RRR)}) # user system elapsed #...
I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...
This is very close to what you had, but replaces mean(x, na.rm=TRUE) with a custom function which either computes the mean of the non-NA values, or supplies NA itself: R> with(tab, aggregate(b, by=list(a), FUN=function(x) if (any(is.finite(z<-na.omit(x)))) mean(z) else NA)) Group.1 x 1 1 2 2 2 2 3 3 NA...
You can use the argument useNA = "ifany" in table. tab <- table(adult$workClass, useNA = "ifany") # Federal-gov Local-gov Never-worked Private # 960 2093 7 22696 # Self-emp-inc Self-emp-not-inc State-gov Without-pay # 1116 2541 1298 14 # <NA> # 1836 By default, the name of the NA count is NA...
I looked through your function line by line and found the problem here: falses = sum( acts[false.ind]) acts is a vector with length 24, false.ind is a vector of length 25. Therefore, the you are trying to subset a vector element which doesn't exists. This produces an NA. If you...
You can add 0 to the possible factor values levels(myVec) <- 0:2 and substitute NA values by 0 myVec[is.na(myVec)] <- 0 myVec # [1] 0 0 1 0 # Levels: 0 1 2 Or, beforehand when the factor is created... (myvec <- factor(ifelse(is.na(myVec), 0, myVec))) # [1] 0 1 2...
Thanks Josh for the guidance. Here's the modified function that produces what I'm looking for: avg.ang <- function(x, ...){ if (sum(is.na(x))==length(x)) { NA } else { round(mean.circular(circular(x, units="degrees", rotation="clock", zero=pi/2, modulo="2pi"), na.rm=TRUE)) } } The na.rm=TRUE is the key. The if/else statement is to deal with occurrences where all cells=NA...
Since the intervals don't have gaps, you can use findInterval. I would change the lookup table to a list containing the break points and defaults for each value using dlply from plyr. ## Transform lookup table to a list with breaks for intervals library(plyr) lookup <- dlply(testdefs, .(LABMET_ID), function(x) list(breaks=c(rbind(x$lower,...
I think you are making it harder than you need to. The code you have in the first chunk there would be fine as a function: SetNaToZero <- function(x) { x[,2][is.na(x[, 2])] <- 0 return(x) } In action: set.seed(123) dat <- data.frame(a=rnorm(10), b=sample(c(NA, 1:3), 10, replace=T)) SetNaToZero(dat) a b 1...