Menu
  • HOME
  • TAGS

R: Volatility function that interprets NAs

r,data.frame,time-series,na

We can remove the 'NA' elements with !is.na(x), but the lag(x) will return NA as the first element, which can be removed by using na.rm=TRUE in the sd volcalc= function (x) { x <- x[!is.na(x)] returns=log(x)-log(lag(x)) vol=sd(returns, na.rm=TRUE)*sqrt(252) return(vol) } apply(dataexample, 2, volcalc) # x y #3.012588 1.030484 ...

dplyr join define NA values

r,left-join,dplyr,na

First off, I would like to recommend not to use the combination data.frame(cbind(...)). Here's why: cbind creates a matrix by default if you only pass atomic vectors to it. And matrices in R can only have one type of data (think of matrices as a vector with dimension attribute, i.e....

R conditional replace more columns by lookup

r,data.frame,lookup,na

Using your code, we need to replicate the 'repl' column to make the two subset datasets equal and then assign the values as you did val <- df2$repl[match(df1$subj, df2$subj)][row(df1[mycols])][is.na(df1[mycols])] df1[mycols][is.na(df1[mycols])] <- val df1 # subj a b c #1 1 5 5 1 #2 1 2 3 5 #3 2...

replace NA value with the group value

r,na

Try ave. It applies a function to groups. Have a look at ?ave for details, e.g.: df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)])) # person_id hhold_no med_card med_card_new #1 1 1 1 1 #2 2 1 1 1 #3 3 1 NA 1 #4 4 1 NA 1 #5 5 1 NA...

Python pandas - averaging 10 min measurements to 15min mean and 60min mean depending on the length of the data gap

python,pandas,mean,na

# Setup problem import pandas as pd import numpy as np num_samples = 100 s = pd.Series(np.random.randint(0, 500, num_samples), index=pd.date_range('03/06/2015', periods=num_samples, freq='10min')) mask = np.random.rand(num_samples) < .7 s[mask] = np.nan # Loop through index # Note the perc_nan variable can be changed depending on what percentage of the interval must...

Replacing NA's in data frame with 0, cases still not showing up in output R

r,replace,na

Given a data frame df, complete.cases(df) returns a vector of true or false values. You can use this vector as an index of df to extract a subset of it that has complete cases, like this: df[complete.cases(df),] The number of complete cases, or nobs value as you write in your...

ggplot line graph with NA values

r,ggplot2,na

My preferred solution would be to reshape this to long format. Then you only need 1 geom_line call. Especially if you have many series, that's tidier. Same result as LyzandeR's 2nd chart. library(ggplot2) library(reshape2) test2 <- melt(test, id.var='YEAR') test2 <- na.omit(test2) ggplot(test2, aes(x=YEAR, y=value, color=variable)) + geom_line() + scale_color_manual(values=c('red', 'green'))...

R date column error using data[data==“”] <- NA

r,date,na

You can create a logical index to subset columns other than the 'Date' class, and use that to replace the '' with NA indx <- sapply(data, class)!='Date' data[indx][data[indx]==''] <- NA It is the 'Date' class that is creating the problem. Another option would be to convert the data to matrix...

strptime returning NA values [closed]

r,date,format,na,strptime

The format argument you give should reflect the format that the data is currently in, not the format that you want to convert it to, so you would have to set format = "%Y-%m-%d". Read the documentation on strptime again and it should make sense.

Converting NAs does not work in R

r,missing-data,na

This short example illustrates one of possible ways of introducing a new level into a factor: x <- factor(c(NA, NA, "a", "b", NA, "b")) x[is.na(x)] <- "c" # this won't work, no such level as "c" in levels(x) ## Warning message: ## In `[<-.factor`(`*tmp*`, is.na(x), value = "c") : ##...

how to deal with NaN, Na and Inf to calculate mean in R? + after measuring community detection metrics [closed]

r,nan,mean,na

Well, I would distinguish between the cases of NA/NaN/Infinity and the rest. I would certainly not convert them to zero as this would distort the result significantly while at the same time, not having any real mathematical sense. If a value is NA, then it is not, as the name...

Group instances based on NA values in r

r,file,csv,instance,na

df[!is.na(df$Value), ] Size Value Location Num1 Num2 Rent 1 800 900 <NA> 2 2 y 3 1100 1300 uptown 3 3 n 4 1200 1100 <NA> 2 1 y And df[is.na(df$Value), ] Size Value Location Num1 Num2 Rent 2 850 NA midcity NA 3 y 5 1000 NA Lakeview NA...

Replace NA row with non-NA value from previous row and certain column

r,na

Finally I realized my own vectorized version. It returns expected output: na.replace <- function(x, k) { isNA <- is.na(x[, k]) x[isNA, ] <- na.locf(x[, k], na.rm = F)[isNA] x } ...

logical dataframe with numerical dataframe and substitute FALSE by NA with R

r,na

Here's a possible solution is.na(j) <- j == FALSE df[] <- df[as.matrix(j)] df # X12h X13h X14h X15h X16h # 00003 NA NA NA NA NA # 00017 NA NA NA 18 NA # 00018 NA NA 25 NA NA # 00021 20 NA NA NA 7 # 00025 NA...

converting “1984-03-25 02:00:00” to POSIX gives NA

r,datetime,posix,na

While you don't say what your timezone is, this looks like Daylight Saving Time (DST) issue. In timezones that use DST, there will be a day where the hour "jumps" from 1:59:59.999 to 3:00:00.000. This means that any times in the 2AM hour do not exist on this day. My...

Why pmax(dataFrame, int) would introduce NAs?

r,data.frame,na

pmax is not designed to be used with data.frame input. The error is introduced in line 35 of pmax: mmm[change] <- each[change] because each is defined to be as long as the length of the input, which for a data.frame is the number of columns. Therefore when it tries to...

Insert NA's in case there are no observations when using subset() and then dcast or tapply

r,subset,na,reshape2,tapply

There is a drop argument in dcast that you can use in this situation, but you'll need to convert subject to a factor. Here is a dataset with a second subject ID that has no values that meet your condition that the absolute value of z.score is less than one....

Conditional NA filling with data.table

r,data.table,plyr,na

This is all about writing a modified na.locf function. After that you can plug it into data.table like any other function. new.locf <- function(x){ # might want to think about the end of this loop # this works here but you might need to add another case # if there...

Sum NA values in r

r,sorting,na

The right way to do iterative code in R is to avoid explicit for loops. Use apply (and the company) instead. @jeremycg gave you the right R-ish answer. Regarding your code, you should make some editing to make it work. temp <- c() for (i in 1:length(data)){ temp[names(data)[i]] <- sum(is.na(data[i]))...

Trouble removing NA from whole row but from just one column [duplicate]

r,na

library(DataCombine) DropNA(dframe1, Var = "Sex", message = F) In your code I see two possible mistakes: 1) You didn't close the square bracket; 2) it should be dframe1$Sex, not dframe1$sex (remember that R is case-sensitive)....

Substitute NA values depending of position in dataframe

r,row,interpolation,na,missing-data

You could also use the na.approx function from the zoo package. Note that this has a slightly different behavior (than the solution by @flodel) when you have two consecutive NA values. For the first and last row you could then use na.locf. y <- na.approx(x) y[nrow(y), ] <- na.locf(y[(nrow(y)-1):nrow(y), ])[2,...

Identify data blocks

r,na

Try library(data.table)#v >= 1.9.5 (devel version - install from GitHub). #library(devtools) #install_github("Rdatatable/data.table", build_vignettes = FALSE) as.data.table(example)[, res:=(NA | (min(example)< -1))*example, by=rleid(is.na(example))][, res] ...

replacing NA with value in adjacent column in R

r,vector,replace,atomic,na

If only one value is not NA amongst IMILEFT and IMIRIGHT (as in your example), just try (df is your data.frame): indx<-is.na(df$IMIAVG) df$IMIAVG[indx]<-rowSums(df[indx,1:2],na.rm=TRUE) Btw, if you want to find the mean value of each row and exclude the NA values in the process, you can set the na.rm argument as...

When subsetting rows with a factor with equal (==), NA's are also included. It doesn't happen with %in%. Is it normal?

r,equals,subset,na

Yes, the return types of == and %in% are different with respect to NA because of how "%in%" is defined... # Data... x <- c("A",NA,"A") # When NA is encountered NA is returned # Philosophically correct - who knows if the # missing value at NA is equal to "A"?!...

NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?

r,cluster-analysis,k-means,na

The output of kmeans corresponds to the elements of the object passed as argument x. In your case, you omit the NA elements, and so $cluster indicates the cluster that each element of na.omit(x) belongs to. Here's a simple example: d <- data.frame(x=runif(100), cluster=NA) d$x[sample(100, 10)] <- NA clus <-...

Error converting data type varchar to float while calculating absolute sum

sql,sql-server,group-by,sum,na

You are doing the numeric logic after the sum(). You need to do it inside the sum(): select e.[ ACCOUNT_NUM] as acct_num, 'MTL' as src sum(case when isnumeric([ TRANSACTION_AMT]) = 1 then ABS([ TRANSACTION_AMT]) else 0 end) as abs_total_txn_amt from mtb..MTEL e (nolock) group by e.[ ACCOUNT_NUM]; And, you probably...

Dropping all left NAs in a dataframe and left shifting the cleaned rows

r,tail,na

I don't think you can do this without a loop. dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA))) dat[3,2] <- NA # V1 V2 V3 V4 V5 V6 V7 V8 # 1 NA NA 1 3 5 NA NA NA # 2 NA 1 2 3 6 7 8 NA # 3 1...

Excel VLookup #NV error

excel,vlookup,na

Try switching the columns in TRACKS around. VLOOKUP bases it's lookup on the first column, so in your case, it's looking through column A (1, 2, 3, etc.) If you want your VLOOKUP to be based on the text, it needs to be in A instead. i.e. | A |...

Merging data frames row-wise and column-wise in R

r,merge,repeat,na

Change the first data.frame to long format, then it's easy. df1 is A and df2 is B. I also name the numbers id. require(tidyr) # wide to long (your example D) df1tidy <- gather(df1,addname,addval,-id) # don't need the original add* vars or NA's df1tidy$addname <- NULL df1tidy <- df1tidy[!is.na(df1tidy$addval), ]...

Excel, Array Formulas, N/A outside of range, and ROW()

excel,na,array-formulas

I believe that, in such clauses, Excel gives precedence to the artificial expansion of the reference to match that of the worksheet range selected (which it will always do by filling with #N/As) over first resolving the IF clause over the array. So, whereas "normally" (e.g in a single-cell array...

Recoding variables with NAs in R

r,na,recode

You have three different conditions, so it's most natural to express it in three lines: z <- rep(0,nrow(frame)) z[apply(is.na(frame),1,all)] <- NA z[apply(frame==1 ,1,any)] <- 1 # [1] NA 0 1 ...

Why do mean() and mean(aggregate()) return different results?

r,aggregate,mean,na

I think this is because in the mean(dat$x, na.rm=T) version, each NA that is removed, reduces the number of observations by 1, whereas if you aggregate first, in your example you have an NA in row 10 (ID 11) which is removed but since the other rows with ID 11...

plot data - boxplot in R how to remove NAs

r,plot,na

Use ggplot: library(ggplot2) ggplot(students) + geom_boxplot(aes(x = success, y = WAM)) ...

How to find the percentage of NAs in a data.frame using apply?

r,csv,data.frame,apply,na

x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5)) For the whole dataframe: sum(is.na(x))/prod(dim(x)) Or mean(is.na(x)) For columns: apply(x, 2, function(col)sum(is.na(col))/length(col)) Or colMeans(is.na(x)) ...

From multiple matricies, calculate percentage of a value from each matrix and store them into a new vector

r,function,matrix,percentage,na

I was over thinking it. Below is my new function. a<-A[!is.na(as.vector(A))] # A, B, C, D are matrices with NA values b<-B[!is.na(as.vector(B))] c<-C[!is.na(as.vector(C))] d<-D[!is.na(as.vector(D))] mm<-function(x){mean(x==1)} # make percentage function Item.1<-lapply(data.frame(a,b,c,d), mm) # apply function to matrices ...

Handling NA and NAN in R

r,na

If you insist on 1% daily return, you could do it in log scale, without touching Inf. set.seed(2013) P_1 <- 100 # Initial price of stock r <- rnorm(100000, .01, .05) # Generating 100,000 instances logP <- log(P_1) + cumsum(log(1+r)) dlogP <-diff(logP) # The change in logs from t+1 and...

fill in NA based on the last non-NA value for each group in R [duplicate]

r,dplyr,na

This may have been answered before, but I don't know if it's been answered in a dplyr context. zoo::na.locf() is your friend: m %>% group_by(y1) %>% mutate(y4=zoo::na.locf(y3)) ...

How to deal with NA when using lappy in R

r,if-statement,na

Since I don't have an example df check if this works for you: do.call("cbind", lapply(err, function(x) if(min(x, na.rm=T) > -20 & max(x, na.rm=T) < 20) return(x) )) ...

how do I remove question mark(?) from a data set in R

r,na

look at gsub census$x <- gsub("?",NA,census$x, fixed = TRUE) edit: forgot to add fixed = TRUE As Richard pointed out, this will catch all occurrences of a ?...

make sum of an empty set/set of NA's NA instead of 0?

r,sum,na

You can try f1 <- function(x) if(all(is.na(x))) NA_integer_ else sum(x, na.rm=TRUE) aggregate(.~name, test, FUN=f1, na.action=NULL) Or library(dplyr) test %>% group_by(name) %>% summarise_each(funs(f1)) Or library(data.table) setDT(test)[, lapply(.SD, f1), name] ...

How to select the first and last one test without NA in r

r,select,na

Here's a somewhat vectorized approach with base R start <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "first") + 1L) end <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "last") + 1L) maxval <- do.call(pmax, c(df[-1L], na.rm = TRUE)) cbind(df[1L], start = df[start], end = df[end], maxvalue = maxval) # Person start end maxvalue # 1...

How test if an NA value is equal to zero; replace if so, leave as NA if not

r,replace,na

I'd use this: DF <- read.table(text = "Dom.Supply Feed Seed Waste Processing Other.Uses Food 9 NA 1 NA NA NA 7 7 NA 1 NA NA NA 5 9 NA 2 NA NA NA 7 9 NA 2 NA NA NA 7 16 NA 2 NA NA NA 14", header...

Replace missing values with mean [closed]

r,na

There's two questions here: 1) How do I get the mean of a set of numbers excluding NA? Mean = mean(df[, 7], na.rm = TRUE) 2) How do I replace NA with a specified value in a column? df[,7][is.na(df[,7])] <- Mean ...

Finding the percentages of missing information in each column in parallel using bigmemory and parallel packages in R

r,parallel-processing,apply,jobs,na

I got the answer. When we call big.mat we should use [,] so here's the partial answer. > colMeans(is.na(big.mat[,])) Year Month DayofMonth DayOfWeek 0.00000000 0.00000000 0.00000000 0.00000000 DepTime CRSDepTime ArrTime CRSArrTime 0.02102102 0.00000000 0.02402402 0.00000000 UniqueCarrier FlightNum TailNum ActualElapsedTime 1.00000000 0.00000000 0.97997998 0.02402402 CRSElapsedTime AirTime ArrDelay DepDelay 0.00000000 0.02402402 0.02402402...

Function to assign cell value to subsequent NA-cells (same column) [duplicate]

r,if-statement,na

Here's a little one-line function that'll work, in case you don't want to load another package: rollForward <- function(x) { c(NA, x[!is.na(x)])[cumsum(!is.na(x)) + 1] } test[,"class"] <- rollForward(test[,"class"]) test # ID class # [1,] 1 3 # [2,] 2 4 # [3,] 3 3 # [4,] NA 3 # [5,]...

How to sum each row in a .xts object, where values are NOT missing

r,xts,na,data-manipulation

is.na is a function, you should use which(!is.na(live[x,])) ...

Removing non-numeric values from data in R

r,data.frame,na

add two arguments when reading the file: na.strings = "*****", stringsAsFactors = FALSE...

handling 'wrong' entries and NAs in a data.table substituting them with entries from other table

r,data.table,na

Try the following: # use your criteria to determine what the incorrect values are in each column wrongs = lapply(dt_[, !"Cat", with = F], function(x) which(is.na(as.numeric(x)))) # now substitute for (n in names(wrongs)) dt_[wrongs[[n]], (n) := originTable[[n]][wrongs[[n]]]] dt_ # Cat Jan Feb Mar Apr May #1: A 1 2 5...

Extracting multiple data files from a single csv file

r,csv,na,read-data

It should work with column indices. I tested with this fake file: test = structure(list(testA1 = 1:5, testA2 = 1:5, testA3 = 1:5, c(NA, NA, NA, NA, NA), c(NA, NA, NA, NA, NA), c(NA, NA, NA, NA, NA), testB1 = 1:5, testB2 = 1:5, c(NA, NA, NA, NA, NA), c(NA,...

Creating a function in R to search through and remove data given an NA

r,function,search,data.table,na

All credit for this goes to @Arun. testDT[testDT[, .I[Data[Bar == "Gray"] %between% c(0.55, 0.65)], list(Zone, SampleID, Color)]$V1] ...

Check if a column is na in a list

r,na

Try sapply(lst, function(x) any(colSums(!is.na(x))==0)) #[1] TRUE FALSE TRUE Update If you want to check for a particular column, for e.g. column 2 sapply(lst, function(x) all(is.na(x[,2]))) #[1] FALSE FALSE TRUE Or sapply(lst, function(x) sum(!is.na(x[,2]))==0) #[1] FALSE FALSE TRUE data df <- data.frame(col1= NA, col2=1:5, col3=c(1:3,NA, NA)) df1 <- data.frame(col1=1:5, col2=6:10, col3=11:15)...

Why don't #N/A and MAX (or MIN) play well with each other and what can I do abut it

excel,max,min,na

For Excel 2010 or later: =IFERROR(LOOKUP(2,1/(C1=AGGREGATE({14,15},6,$C$1:$C$12,1)),{"MAX","MIN"}),"") Regards...

How to select the last one test without NA in r

r,select,na

You can use this solution: > t(apply(d[-1],1,function(rw) rw[range(which(!is.na(rw)))])) [,1] [,2] [1,] 62 59 [2,] 49 60 [3,] 59 34 where d is your data set. How it works: for each row of d (rows are scanned using apply(d[-1],1,...), where d[-1] excludes the first column), get the indices of non-NA test...

Collapsing rows where some are all NA, others are disjoint with some NAs

r,aggregate,dataframes,na

Try library(dplyr) DF %>% group_by(ID) %>% summarise_each(funs(sum(., na.rm=TRUE))) ...

Converting Factor Levels to Numbers

r,matrix,na

It seems like your U column should be 2 corresponding to "B", not 1. Please clarify that. You could try match() matrix(match(demodata2, LETTERS), nrow(demodata2), dimnames=dimnames(demodata2)) # Q R S T U W # [1,] 1 2 NA NA 2 2 # [2,] 2 2 2 NA 2 2 # [3,]...

R : how to get rid of all NaN values and replace them by 0 in a complex function / R file [closed]

r,nan,na,zero

I'm not sure about what you mean by a whole file/complex function, but depending on the data type you're storing the file with, it's pretty easy using is.na(): df <- data.frame(A = rep(1, 5), B = rep(1,5)) df$B[1] <- NA df$A[3] <- NA df[is.na(df)] <- 0 ...

How do you represent na in a Pandas DataFrame?

python,pandas,nan,na

Currently there is no NA value available in Pandas or NumPy. From the section "Working with missing data" in the Pandas manual (http://pandas.pydata.org/pandas-docs/stable/missing_data.html): The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example, scikits.timeseries....

R ignore missing data

r,ignore,na

I would something like this : idx <- which(dat1>2.5 & dat2>2.5,arr.ind=TRUE) cbind(idx,v1=dat1[idx],v2=dat2[idx]) reprodicible example: set.seed(1) dat1 <- as.data.frame(matrix(runif(12,1,5),ncol=3)) dat2 <- as.data.frame(matrix(runif(12,1,5),ncol=3)) idx <- which(dat1>2.5 & dat2>2.5,arr.ind=TRUE) cbind(idx,v1=dat1[idx],v2=dat2[idx]) # row col v1 v2 # [1,] 3 1 3.291413 4.079366 # [2,] 4 1 4.632831 2.990797 # [3,] 2 2 4.593559 4.967624...

functions na.rv(T), na.omit, is.finite, etc. don't work for the mean of a column

r,mean,na,inf

You need to do the following: #since you don't care about the Infs convert them to NAs #so that they get removed at the mean function #since we have set na.rm=TRUE df$amihud[df$amihud==Inf] <- NA library(dplyr) #you need to use summarise to calculate the means as below: res <- df %>%...

R padding time series with missing time units

r,datetime,time-series,na

How about this - it works for a small sample data: Your input data: df <- read.table(header=T, text='count time 47 "15/12/2014 06:30" 3 "15/12/2014 06:31" 431 "15/12/2014 06:34" 320 "15/12/2014 06:35" 42 "15/12/2014 06:36" 13 "15/12/2014 06:37" 383 "15/12/2014 06:38" 160 "15/12/2014 06:39"') Format the "time" column: df$time <- as.POSIXct(df$time,...

NAs introduced by coercion when labeling breaks in cut function

r,label,cut,na

As the others already stated in the comments, your "NAs introduced by coercion" is not reproducible. But let me just give you a hint on how to make the code more "scalable" and readable: x <- c(1890, 1899,1900,2001,2012,1999,1943,1944,1950,1988,1981,1988,1997,2014) brk <- seq(1890, 2020, by=10) # breaks cut(x, breaks=brk, right=FALSE, labels=paste(brk[-length(brk)], "s",...

R: deleting columns where certain percentage of values is missing [duplicate]

r,data.frame,apply,na,missing-data

x <- sample.df[ lapply( sample.df, function(x) sum(is.na(x)) / length(x) ) < 0.1 ] ...

Inserting NA after Test

r,list,na

This is one solution (brute force) with just one missing value: prac <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403), b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401)) NA.index <- which(abs(prac$b[1:length(prac$a)] - prac$a) > 0.05) newlist.a <- c(prac$a[1:NA.index-1], NA, prac$a[NA.index]) this here should be generizeable( depending on how your data actually is structured): prac <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403), b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401)) for(i in seq_along(prac$a))...

return NA value in NumericVector Rcpp unexpected behavior

r,rcpp,na

You can initialize out with NA values: #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector fill_backward(NumericVector x) { int n = x.size(); NumericVector out = NumericVector(n, NumericVector::get_na()); for (int i = 0; i < n; ++i) { if (R_IsNA(x[i])) { for (int j = i+1; j < n; ++j) {...

nested ifelse in R so close to working

r,if-statement,data.frame,na

It's not clear from your example, but I think the problem is you're handling NA's incorrectly and\or using wrong type for data.frame's columns. Try rewriting your code like that: #if your columns are of character type (warnings are ok) child$G4_R_2_4<-as.numeric(child$G4_R_2_4) child$G4_R_2_5<-as.numeric(child$G4_R_2_5) child$G4_R_2_5_option2<-as.numeric(child$G4_R_2_5_option2) #correct NA handling child$kg<-ifelse(is.na(child$G4_R_2_4) & child$G4_R_2_5 < child$G4_R_2_5_option2,...

Remove rows based on columns values

r,na

You can try mydata[!rowSums(is.na(mydata[,c('x1', 'x2')])),] # y x1 x2 x3 #1 1 1 1 1 #3 1 1 1 1 #5 1 4 1 1 #6 2 5 1 1 #7 2 1 8 8 #9 2 2 2 2 #10 3 5 2 NA #11 3 2 4 4...

Removing Columns Named “NA”

r,na

Looks like you have an actual NA in your names, instead of "NA". The former represents a missing value, the latter is a character string that looks like the symbol that represents the missing value. Use: df <- df[!is.na(names(df))] Illustrating with iris: > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1...

Replace Inf in R data.table / Show number of Inf in colums

r,data.table,infinite,na

is.finite and is.infinite don't have a data.frame or a data.table methods like is.na has (compare methods(is.infinite) vs methods(is.na)) You could alternatively loop thru the columns and then use colSums DT[, colSums(sapply(.SD, is.infinite))] # a b c # 2 0 1 Alternatively, you could use Reduce instead of colSums DT[, Reduce(`+`,...

Calculate column medians with NA's

r,na,median

Another option is library(dplyr) df %>% mutate_each(funs(median=.-median(., na.rm=TRUE)), -ID) ...

Empty rows in list as NA values in data.frame in R

r,list,lapply,na,rbind

You just need an in/else in your function: rankall <- function(rank) { split_by_state <- split(df, df$state) ranked_hospitals <- lapply(split_by_state, function (x) { indx <- x$rank==rank if(any(indx)){ return(x[indx, ]) else{ out = x[1, ] out$hospital = NA return(out) } } } ...

Create Flag Variables in R using a function

r,function,data.frame,na

The "[<-" function can create (assign) new columns by name: > dat[ , paste0( "M_",names(dat)[-1])] <- lapply(dat[-1], function(x) as.numeric(is.na(x)) ) > dat INDEX HEIGHT LENGTH M_HEIGHT M_LENGTH 1 1 70 55 0 0 2 2 60 NA 0 1 3 3 NA 35 1 0 4 4 NA NA 1...

Creating a counterfactual group for missing values NAs

r,matching,na

It seems I will be able to handle it using Amelia package: http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf#subsection.4.4 http://cran.r-project.org/web/packages/Amelia/Amelia.pdf and true, there is lots of materials on it on Cross Validated, e.g. http://stats.stackexchange.com/questions/95832/missing-values-nas-in-the-test-data-when-using-predict-lm-in-r @nograpes, thank you for all the hints!...

Fill NA values with the trailing row value times a growth rate?

r,plyr,dplyr,apply,na

It looks like dplyr can't handle access newly assigned lag values. Here is a solution that should work even if the NA's are in the middle of a column. df <- apply( df, 2, function(x){ if(sum(is.na(x)) == 0){return(x)} ## updated with optimized portion from @josilber r <- rle(is.na(x)) na.loc <-...

Replacing certain values in a data frame as NAs

r,data.frame,na

Try df <- within(df, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA)) df # names gender mark #1 John M 1 #2 Mark M 2 #3 Larry M 3 #4 Will M 1 #5 Kate F NA #6 Daria F NA #7 Tom M NA Or df$mark[df$names %in% c('Kate',...

How to compute a running cor.test() in a data.frame with NA values in R?

r,correlation,na

You can use complete.cases to get a logical vector of complete rows (TRUE = complete); then subsetting inside ad-hoc function used for testing too library(gtools) df <- data.frame(temp=rnorm(100, 10:30), prec=rnorm(100, 1:300), humi=rnorm(100, 1:100)) df$prec[c(1:10, 25:30, 95:100)] <-NA df$humi[c(15:19, 20:25, 80:90)] <-NA my.fun <- function(x,y) { my.df <- data.frame(x,y) my.df.cmpl <-...

Replace Values in a column wiht NA if values from another column are not 1 [closed]

r,replace,na

Try library(sp) [email protected][,'roadtype'][[email protected][,'jointcount']!=1] <- NA S # SpatialPoints: # jointcount roadtype #[1,] 1 3 #[2,] 4 NA #[3,] 3 NA #[4,] 1 1 #[5,] 1 4 data jointcount = c(1,4,3,1,1) roadtype = c(3,2,5,1,4) S <- SpatialPoints(data.frame(jointcount,roadtype)) ...

Extract rows with unique value in one column and equal to text “NA” in another column using R

r,row,unique,na

subset(dat, !duplicated(C) & !duplicated(C, fromLast=T) & is.na(B)) # A B C # 4 R NA pink # 6 V NA yellow ...

omitting NA values with data.table

r,data.table,na

You can try either library(data.table) na.omit(RRR[, lapply(.SD, function(x) replace(x, which(x==0), NA))]) Or using set for(j in 1:ncol(RRR)){ set(RRR, i=which(RRR[[j]]==0), j=j, value=NA) } na.omit(RRR) Benchmarks set.seed(1) n <- 1000000 RRR <- data.table(matrix(rgeom(100*n,0.5), ncol=100)) RRR1 <- copy(RRR) RRR2 <- copy(RRR) RRR3 <- copy(RRR) system.time({RRR[(RRR==0)] <- NA na.omit(RRR)}) # user system elapsed #...

split dataframe in groups before each non-NA

r,split,subset,apply,na

I think library("plyr") df <- mutate(df,ID=cumsum(!is.na(df$Height))) dfsum <- ddply(df,.(ID),summarise, stems=length(ID), avg_diameter = sqrt(sum((Diameter)^2))) head(dfsum) ## ID stems avg_diameter ## 1 1 1 7.480282 ## 2 2 1 4.774648 should work ... ? To "order[] the rows of each subset acc. to desc(Diameter)", ddply(df,.(ID), arrange,desc(Diameter)) ...

Aggregate NAs in R

r,aggregate,nan,na

This is very close to what you had, but replaces mean(x, na.rm=TRUE) with a custom function which either computes the mean of the non-NA values, or supplies NA itself: R> with(tab, aggregate(b, by=list(a), FUN=function(x) if (any(is.finite(z<-na.omit(x)))) mean(z) else NA)) Group.1 x 1 1 2 2 2 2 3 3 NA...

barplot column for

r,bar-chart,na

You can use the argument useNA = "ifany" in table. tab <- table(adult$workClass, useNA = "ifany") # Federal-gov Local-gov Never-worked Private # 960 2093 7 22696 # Self-emp-inc Self-emp-not-inc State-gov Without-pay # 1116 2541 1298 14 # <NA> # 1836 By default, the name of the NA count is NA...

NA output from sum of numbers in R

r,sum,na,func

I looked through your function line by line and found the problem here: falses = sum( acts[false.ind]) acts is a vector with length 24, false.ind is a vector of length 25. Therefore, the you are trying to subset a vector element which doesn't exists. This produces an NA. If you...

How to convert NA from factor vector to value of 0

r,na

You can add 0 to the possible factor values levels(myVec) <- 0:2 and substitute NA values by 0 myVec[is.na(myVec)] <- 0 myVec # [1] 0 0 1 0 # Levels: 0 1 2 Or, beforehand when the factor is created... (myvec <- factor(ifelse(is.na(myVec), 0, myVec))) # [1] 0 1 2...

Aggregate raster in R with NA values

r,aggregate,na,r-raster

Thanks Josh for the guidance. Here's the modified function that produces what I'm looking for: avg.ang <- function(x, ...){ if (sum(is.na(x))==length(x)) { NA } else { round(mean.circular(circular(x, units="degrees", rotation="clock", zero=pi/2, modulo="2pi"), na.rm=TRUE)) } } The na.rm=TRUE is the key. The if/else statement is to deal with occurrences where all cells=NA...

R replacing columns by lookup to dictionary

r,data.frame,lookup,na

Since the intervals don't have gaps, you can use findInterval. I would change the lookup table to a list containing the break points and defaults for each value using dlply from plyr. ## Transform lookup table to a list with breaks for intervals library(plyr) lookup <- dlply(testdefs, .(LABMET_ID), function(x) list(breaks=c(rbind(x$lower,...

function to set NA's in a specific column of a data frame to 0 in R

r,function,na

I think you are making it harder than you need to. The code you have in the first chunk there would be fine as a function: SetNaToZero <- function(x) { x[,2][is.na(x[, 2])] <- 0 return(x) } In action: set.seed(123) dat <- data.frame(a=rnorm(10), b=sample(c(NA, 1:3), 10, replace=T)) SetNaToZero(dat) a b 1...