Menu
  • HOME
  • TAGS

Convert strings of data to “Data” objects in R [duplicate]

Tag: r,date,csv

This question already has an answer here:

  • as.Date with dates in format m/d/y in R 2 answers

My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects.

I have a data.frame nmmaps. Here is a short portion of it.

city     date death temp dewpoint     pm10       o3 time season

1 chic 1/1/1987   130 31.5   31.500 27.79119 4.025928    1 winter

2 chic 1/2/1987   150 33.0   29.875       NA 4.579652    2 winter

3 chic 1/3/1987   101 33.0   27.375 33.67382 3.400928    3 winter

4 chic 1/4/1987   135 29.0   28.625 40.79119 3.942595    4 winter

5 chic 1/5/1987   126 32.0   28.875       NA 4.400928    5 winter

6 chic 1/6/1987   130 40.0   35.125 41.79119 5.984261    6 winter

I imported the data from an Excel file with the following command:

nmmaps <- read.csv("chicago-nmmaps.csv" , as.is = T)

When I get to the point of converting by as.Date, I enter nmmaps$date <- as.Date(nmmaps$date) and get a data.frame shown below.

  city       date death temp dewpoint     pm10       o3 time season

1 chic 0001-01-19   130 31.5   31.500 27.79119 4.025928    1 winter

2 chic 0001-02-19   150 33.0   29.875       NA 4.579652    2 winter

3 chic 0001-03-19   101 33.0   27.375 33.67382 3.400928    3 winter

4 chic 0001-04-19   135 29.0   28.625 40.79119 3.942595    4 winter

5 chic 0001-05-19   126 32.0   28.875       NA 4.400928    5 winter

6 chic 0001-06-19   130 40.0   35.125 41.79119 5.984261    6 winter

Why are the dates shown this way? Also, some dates have a NA field. I would like the years to shown as 1987, 1988, 1989, etc..

Best How To :

If you read on the R help page for as.Date by typing ?as.Date you will see there is a default format assumed if you do not specify. So to specify for your data you would do

      nmmaps$date <- as.Date(nmmaps$date, format="%m/%d/%Y") 

Aggregating data in R

r

Using data.table library(data.table) setDT(df1)[, list(pages=paste(page, collapse="_")), list(user_id, date=as.Date(date, '%m/%d/%Y'))] Or using dplyr library(dplyr) df1 %>% group_by(user_id, date=as.Date(date, '%m/%d/%Y')) %>% summarise(pages=paste(page, collapse='_')) ...

optimization algorithm for circular data

r,optimization,circular,maximization

I would compute all the pairs of rows in df: (pairs <- cbind(1:nrow(df), c(2:nrow(df), 1))) # [,1] [,2] # [1,] 1 2 # [2,] 2 3 # [3,] 3 4 # [4,] 4 5 # [5,] 5 6 # [6,] 6 1 You can find the best pairing with which.max:...

Skip some lines with fread

r,fread

In linux, you could use awk with fread or it can be piped with read.table. Here, I changed the delimiter to , using awk pth <- '/home/akrun/file.txt' #change it to your path v1 <- sprintf("awk '/^(ID_REF|LMN)/{ matched = 1} matched {$1=$1; print}' OFS=\",\" %s", pth) and read with fread library(data.table)...

How to plot data points at particular location in a map in R

r,google-maps,ggmap

This should get you headed in the right direction, but be sure to check out the examples pointed out by @Jaap in the comments. library(ggmap) map <- get_map(location = "Mumbai", zoom = 12) df <- data.frame(location = c("Airoli", "Andheri East", "Andheri West", "Arya Nagar", "Asalfa", "Bandra East", "Bandra West"), values...

Correlate by levels of a variable in R

r,correlation

You can put your records into a data.frame and then split by the cateogies and then run the correlation for each of the categories. sapply( split(data.frame(var1, var2), categories), function(x) cor(x[[1]],x[[2]]) ) This can look prettier with the dplyr library library(dplyr) data.frame(var1=var1, var2=var2, categories=categories) %>% group_by(categories) %>% summarize(cor= cor(var1, var2)) ...

Appending a data frame with for if and else statements or how do put print in dataframe

r,loops,data.frame,append

It's generally not a good idea to try to add rows one-at-a-time to a data.frame. it's better to generate all the column data at once and then throw it into a data.frame. For your specific example, the ifelse() function can help list<-c(10,20,5) data.frame(x=list, y=ifelse(list<8, "Greater","Less")) ...

R: Using the “names” function on a dataset created within a loop

r,paste,assign,names

A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above): import <- lapply(files, read.csv, header=FALSE) Then if you want to operate on each data.frame in the list...

Remove quotes to use result as dataset name

r,string

You can get the values with get or mget (for multiple objects) lst <- mget(myvector) lapply(seq_along(lst), function(i) write.csv(lst[[i]], file=paste(myvector[i], '.csv', sep='')) ...

How to split a text into two meaningful words in R

r,string-split,stemming,text-analysis

Given a list of English words you can do this pretty simply by looking up every possible split of the word in the list. I'll use the first Google hit I found for my word list, which contains about 70k lower-case words: wl <- read.table("http://www-personal.umich.edu/~jlawler/wordlist")$V1 check.word <- function(x, wl) {...

R stops displaying maps

r,google-maps,ggmap

You are just saving a map into variable and not displaying it. Just do library(ggmap) map <- qmap('Anaheim', zoom = 10, maptype = 'roadmap') map Or library(ggmap) qmap('Anaheim', zoom = 10, maptype = 'roadmap') ...

How to build a 'for' loop with input$i in R Shiny

r,loops,for-loop,shiny

Use [[ or [ if you want to subset by string names, not $. From Hadley's Advanced R, "x$y is equivalent to x[["y", exact = FALSE]]." ## Create input input <- `names<-`(lapply(landelist, function(x) sample(0:1, 1)), landelist) filterland <- c() for (landeselect in landelist) if (input[[landeselect]] == TRUE) # use `[[`...

SQL Server / C# : Filter for System.Date - results only entries at 00:00:00

c#,asp.net,sql-server,date,gridview-sorting

What happens if you change all of the filters to use 'LIKE': if (DropDownList1.SelectedValue.ToString().Equals("Start")) { FilterExpression = string.Format("Start LIKE '{0}%'", TextBox1.Text); } Then, you're not matching against an exact date (at midnight), but matching any date-times which start with that date. Update Or perhaps you could try this... if (DropDownList1.SelectedValue.ToString().Equals("Start"))...

Sleep Shiny WebApp to let it refresh… Any alternative?

r,shiny,sleep

some reproducible code would allow me to give you some example code, but in the absence of that... wrap what you currently have in another if(), checking for length = 0 (or just && it, with the NULL check first), and display your favorite placeholder message....

IBM Cognos _days_between function not working

mysql,database,date,cognos

The Cognos _days_between function works with dates, not with datetimes. Some databases, like Oracle, store all dates with a timestamp. On a query directly to the datasource, try using the database's functions to get this data instead. When possible, this is always preferable as it pushes work to the database,...

Convert strings of data to “Data” objects in R [duplicate]

r,date,csv

If you read on the R help page for as.Date by typing ?as.Date you will see there is a default format assumed if you do not specify. So to specify for your data you would do nmmaps$date <- as.Date(nmmaps$date, format="%m/%d/%Y") ...

Using VLOOKUP formula or other function to compare two columns

mysql,excel,vba,date

If data in your first table starts at A2, and your other column starts at D2, then use in E2 =VLOOKUP(D2,$A$2:$B$17,2,0) Copy down as needed....

Add days, weeks, months to date using jQuery

jquery,date

jQuery is not for DateTime manipulation. It's for querying and manipulating DOM objects. For what you need, you can either implement that yourself, or use a specialized third-party library. Moment.js is pretty neat. Examples: moment().subtract(10, 'days').calendar(); // 06/12/2015 moment().subtract(6, 'days').calendar(); // Last Tuesday at 1:51 PM moment().subtract(3, 'days').calendar(); // Last...

ggplot2 & facet_wrap - eliminate vertical distance between facets

r,ggplot2

Change the panel.margin argument to panel.margin = unit(c(-0.5,0-0.5,0), "lines"). For some reason the top and bottom margins need to be negative to line up perfectly. Here is the result: ...

Histogram-like summary for interval data

r,statistics,histogram

Using IRanges, you should use findOverlaps or mergeByOverlaps instead of countOverlaps. It, by default, doesn't return no matches though. I'll leave that to you. Instead, will show an alternate method using foverlaps() from data.table package: require(data.table) subject <- data.table(interval = paste("int", 1:4, sep=""), start = c(2,10,12,25), end = c(7,14,18,28)) query...

how to get values from selectInput with shiny

r,shiny

You can simply use input$selectRunid like this: content(GET( "http://stats", path="gentrap/alignments", query=list(runIds=input$selectRunid, userId="dev") add_headers("X-SENTINEL-KEY"="dev"), as = "parsed")) It is probably wise to add some kind of action button and trigger download only on click....

Store every value in a sequence except some values

r

if (length(z) %% 2) { z[-c(1, ceiling(length(z)/2), length(z))] } else z[-c(1, c(1,0) + floor(length(z)/2), length(z))] ...

How to find the days b/w two long date values

javascript,jquery,date

First you need to get your timestamps in to Date() objects, which is simple using the constructor. Then you can use the below function to calculate the difference in days: var date1 = new Date(1433097000000); var date2 = new Date(1434479400000); function daydiff(first, second) { return (second - first) / (1000...

Subsetting rows by passing an argument to a function

r,subset

The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to. if you still want to pass it as string you need to parse and eval it in the right place for example: cond...

Count number of rows meeting criteria in another table - R PRogramming

r

Using dplyr for your first problem: left_join(contacts, listings, by = c("id" = "id")) %>% filter(abs(listing_date - contact_date) < 30) %>% group_by(id) %>% summarise(cnt = n()) %>% right_join(listings) And the output is: id cnt city listing_date 1 6174 2 A 2015-03-01 2 2175 3 B 2015-03-14 3 9176 1 B 2015-03-30...

Keep the second occurrence in a column in R

r,conditional,subset,find-occurrences

Here's another possible data.table solution library(data.table) setDT(df1)[, list(Value = c("uncensored", "censored"), Time = c(Time[match("uncensored", Value)], Time[(.N - match("uncensored", rev(Value))) + 2L])), by = ID] # ID Value Time # 1: 1 uncensored 3 # 2: 1 censored 5 # 3: 2 uncensored 2 # 4: 2 censored 5 Or similarly,...

Subtract time in r, forcing unit of results to minutes [duplicate]

r,posix,posixct

You can try with difftime df1$time.diff <- with(df1, difftime(time.stamp2, time.stamp1, unit='min')) df1 # time.stamp1 time.stamp2 time.diff #1 2015-01-05 15:00:00 2015-01-05 16:00:00 60 mins #2 2015-01-05 16:00:00 2015-01-05 17:00:00 60 mins #3 2015-01-05 18:00:00 2015-01-05 20:00:00 120 mins #4 2015-01-05 19:00:00 2015-01-05 20:00:00 60 mins #5 2015-01-05 20:00:00 2015-01-05 22:00:00 120...

Highlighting specific ranges on a Graph in R

r,graph,highlight

Or you could place a rectangle on the region of interest: rect(xleft=1994,xright = 1998,ybottom=range(CVD$cvd)[1],ytop=range(CVD$cvd)[2], density=10, col = "blue") ...

Serial modification of objects in R

r,oop

I would create a list of all your matrices using mget and ls (and some regex expression according to the names of your matrices) and then modify them all at once using lapply and colnames<- and rownames<- replacement functions. Something among these lines l <- mget(ls(patter = "m\\d+.m")) lapply(l, function(x)...

R Program Vector, record Column Percent

r,vector,percentage

Assuming that you want to get the rowSums of columns that have 'Windows' as column names, we subset the dataset ("sep1") using grep. Then get the rowSums(Sub1), divide by the rowSums of all the numeric columns (sep1[4:7]), multiply by 100, and assign the results to a new column ("newCol") Sub1...

Fitting a subset model with just one lag, using R package FitAR

r,time-series

Use GetFitARpMLE(z,4) You will get > GetFitARpMLE(z,4) $loglikelihood [1] -2350.516 $phiHat ar1 ar2 ar3 ar4 0.0000000 0.0000000 0.0000000 -0.9262513 $constantTerm [1] 0.05388392 ...

Return Column Names when True in R

r

You could loop through the rows of your data, returning the column names where the data is set with an appropriate number of NA values padded at the end: `colnames<-`(t(apply(dat == 1, 1, function(x) c(colnames(dat)[x], rep(NA, 4-sum(x))))), paste("Impair", 1:4)) # Impair1 Impair2 Impair3 Impair4 # 1 "A" NA NA NA...

Using R to Assign Treatments to Groups

r

It's easier to think of it in terms of the two exposures that aren't used, rather than the five that are. Let's limit the number of times an exposure can be excluded: draw_exc <- function(exposures,nexp,ng,max_excluded = 10){ nexc <- length(exposures)-nexp exp_rem <- exposures exc <- matrix(,ng,nexc) for (i in 1:ng){...

Converting column from military time to standard time

r,excel

Given your criteria -- that 322 is represented as 3 and 2045 is 20 -- how about dividing by 100 and then rounding towards 0 with trunc(). time_24hr <- c(1404, 322, 1945, 1005, 945) trunc(time_24hr / 100) ...

How to quickly read a large txt data file (5GB) into R(RStudio) (Centrino 2 P8600, 4Gb RAM)

r,large-data

If you only have 4 GBs of RAM you cannot put 5 GBs of data 'into R'. You can alternatively look at the 'Large memory and out-of-memory data' section of the High Perfomance Computing task view in R. Packages designed for out-of-memory processes such as ff may help you. Otherwise...

Select / subset spatial data in R

r,dictionary,spatial

I'm going with the assumption you meant "to the right" since you said "Another solution might be to drawn a polygon around the Baltic Sea and only to select the points within this polygon" # your sample data pts <- read.table(text="lat long 59.979687 29.706236 60.136177 28.148186 59.331383 22.376234 57.699154 11.667305...

R: recursive function to give groups of consecutive numbers

r,if-statement,recursion,vector,integer

Your sapply call is applying fun across all values of x, when you really want it to be applying across all values of i. To get the sapply to do what I assume you want to do, you can do the following: sapply(X = 1:length(x), FUN = fun, x =...

AppleScript (or swift) add hours to time

swift,date,applescript

You need to pick the hours as an individual variable, like shown below: set currentDate to current date set newHour to ((hours of currentDate) + 8) You can also use this for days, minutes and seconds. This will work. You can then use the variables to construct a new date...

Rbind in variable row size not giving NA's

r,rbind

You can try cSplit library(splitstackshape) setnames(cSplit(mergedDf, 'PROD_CODE', ','), paste0('X',1:4))[] # X1 X2 X3 X4 #1: PRD0900033 PRD0900135 PRD0900220 PRD0900709 #2: PRD0900097 PRD0900550 NA NA #3: PRD0900121 NA NA NA #4: PRD0900353 NA NA NA #5: PRD0900547 PRD0900614 NA NA Or using the devel version of data.table i.e. v1.9.5 library(data.table) setDT(mergedDf)[,...

Translating Stata to R: collapse

r,data.table,stata,code-translation

Your intuition is correct. collapse is the Stata equivalent of R's aggregate function, which produces a new dataset from an input dataset by applying an aggregating function (or multiple aggregating functions, one per variable) to every variable in a dataset.

Fitted values in R forecast missing date / time component

r,time-series,forecasting

Do not use the dates in your plot, use a numeric sequence as x axis. You can use the dates as labels. Try something like this: y=GED$Mfg.Shipments.Total..USA. n=length(y) model_a1 <- auto.arima(y) plot(x=1:n,y,xaxt="n",xlab="") axis(1,at=seq(1,n,length.out=20),labels=index(y)[seq(1,n,length.out=20)], las=2,cex.axis=.5) lines(fitted(model_a1), col = 2) The result depending on your data will be something similar: ...

Format date (with different input) php

php,date,format,multilingual,jquery-validation-engine

I know that Zend has some of this logic in there zend_date (requires Zend Framework ^^), but I would just use a simple solution like this: (where you get the format from a switch statement) $date = $_POST['date']; $toConvert = DateTime::createFromFormat('d-m-Y', $date); switch($lang){ case 'de': $format = 'Y-m-d'; break; default:...

how to call Java method which returns any List from R Language? [on hold]

java,r,rjava

You can do it with rJava package. install.packages('rJava') library(rJava) .jinit() jObj=.jnew("JClass") result=.jcall(jObj,"[D","method1") Here, JClass is a Java class that should be in your ClassPath environment variable, method1 is a static method of JClass that returns double[], [D is a JNI notation for a double array. See that blog entry for...

Find multiple consecutive empty lines

r

Here's a solution for extracting the article lines only. Turned out much more complex and cryptic than I'd been hoping, but I'm pretty sure it works. Also, thanks to akrun for the test data. v1 <- c('ard','b','','','','rr','','fr','','','','','gh','d'); ind <-...

Linear multivariate regression in R

r

multivariate multiple regression can be done by lm(). This is very well documented, but here follows a little example: rawMat <- matrix(rnorm(200), ncol=2) noise <- matrix(rnorm(200, 0, 0.2), ncol=2) B <- matrix( 1:4, ncol=2) P <- t( B %*% t(rawMat)) + noise fit <- lm(P ~ rawMat) summary( fit )...

copy a list of data.tables

r,data.table

copy() is for copying data.table's. You are using it to copy a list. Try.. zz <- lapply(z,copy) zz[[1]][ , newColumn := 1 ] Using your original code, you will see that applying copy() to the list does not make a copy of the original data.table. They are still referenced by...

How (in a vectorized manner) to retrieve single value quantities from dataframe cells containing numeric arrays?

r,dataframes,vectorization

It looks like you're trying to grab summary functions from each entry in a list, ignoring the elements set to -999. You can do this with something like: get_scalar <- function(name, FUN=max) { sapply(mydata[,name], function(x) if(all(x == -999)) NA else FUN(as.numeric(x[x != -999]))) } Note that I've changed your function...

Am I using sapply incorrectly?

r,sapply

sapply iterates through the supplied vector or list and supplies each member in turn to the function. In your case, you're getting the values 2 and 4 and then trying to index your vector again using its own values. Since the oth_let1 vector has only two members, you get NA....

R — frequencies within a variable for repeating values

r,count,duplicates

You can try library(data.table)#v1.9.4+ setDT(yourdf)[, .N, by = A] ...

ggplot equivalent for matplot

r,ggplot2

You can create a similar plot in ggplot, but you will need to do some reshaping of the data first. library(reshape2) #ggplot needs a dataframe data <- as.data.frame(data) #id variable for position in matrix data$id <- 1:nrow(data) #reshape to long format plot_data <- melt(data,id.var="id") #plot ggplot(plot_data, aes(x=id,y=value,group=variable,colour=variable)) + geom_point()+ geom_line(aes(lty=variable))...