This may have been asked before, but I couldn't find it. I have a list of CSV files (439 or so) where, in a few of the files, someone also used commas in editorial comments. The result is that I can't put the files into a data frame, since the files now do not have the same number of elements after splitting them. Anyways, the problem I'm facing looks like this:
vec1 <- paste("484,1213,0,62.0006,1,go -- late F1 max, but glide?")
vec2 <- paste("467,1387,0,62.0026,1,goes2")
ls <- list(vec1, vec2)
What I want to do is to have a data frame with six columns. If there wasn't a comma in the editorial comments for vec1
, I could use (and have been using, until I found this problematic example) the following:
df <- ldply(ls, function(x)unlist(strsplit(x[1], split = ",")))
However, I'm getting the obvious error message that the results do not have the same number of lengths. Is there any way of getting rid of that comma, or turning it into a semi-colon, or ensuring that, if there are 7 elements in a vector, that 6 and 7 are combined?
If it helps, this is how I'm reading the files into R (I'm using scan
because there is other information in the files that I want. There's some odd encoding issues going on here as well, but his seems to work).
data <- scan(file, fileEncoding="latin1", blank.lines.skip = FALSE, what = "list", sep = "\n", quiet = TRUE)