Menu
  • HOME
  • TAGS

stri_split_fixed in a data.table in R

r,data.table,stringi

You could try setDT(DT)[, c('V3', 'V4'):=do.call(rbind.data.frame, stri_split_fixed(V2, ' << ', 2))][] # V1 V2 V3 V4 #1: S01 Alan Hal << Guy John Alan Hal Guy John #2: S02 Jay << Barry Wally Bart Jay Barry Wally Bart #3: S03 Bruce Dick Jean-Paul << Damien Bruce Dick Jean-Paul Damien Or...

how to find integer with coma and zeros after that (regex)?

regex,r,string,stringi

The problem with your integer regex appears to be the backslash(es). I don't know any regex engine in which you would need to escape the opening bracket of a character class, and you certainly don't want to match a literal backslash. Also, to a regex engine that understands it at...

Extract last word in a string after comma if there are multiple words else the first word

r,string-matching,stringr,stringi

You can try sub df$country <- sub('.*,\\s*', '', df$location) df$country #[1] "New Zealand" "USA" "France" Or library(stringr) str_extract(df$location, '\\b[^,]+$') #[1] "New Zealand" "USA" "France" ...

r ngram extraction with regex

regex,r,stringi

Here's one way using base R regex. This can be easily extended to handle arbitrary n-grams. The trick is to put the capture group inside a positive look-ahead assertion, eg., (?=(my_overlapping_pattern)) x <- "I like toast and jam." pattern <- "(?=(\\b[A-Za-z']+\\b \\b[A-Za-z']+\\b))" matches<-gregexpr(pattern, x, perl=TRUE) # a little post-processing needed...

Text encoding - fine on Windows, not nix

linux,r,character-encoding,stringi

read.table("saveFile", header=F, sep="\t", quote="\"",encoding="latin1") ...

How to find all expressions ending with “

html,regex,r,string,stringi

<TABLE\b[^>]+>[\s\S]+?<TR Try this.See demo. http://regex101.com/r/vF0kU2/7...

Retrieving sentence score based on values of words in a dictionary

r,dplyr,lapply,sapply,stringi

Update : Here's the easiest dplyr method I've found so far. And I'll add a stringi function to speed things up. Provided there are no identical sentences in df$text, we can group by that column and then apply mutate() Note: Package versions are dplyr 0.4.1 and stringi 0.4.1 library(dplyr) library(stringi)...

Which regex to use to extract propoer information in stri_regex in R?

regex,r,stringi

Try this: stri_extract_first_regex( element, "(?<=gdac.broadinstitute.org_)[\\w\\.-]+") In general, using regex (?<=start)[set]+, you can extract everything (everything what matches set) after expression start. More info about ICU Regular Expressions: http://userguide.icu-project.org/strings/regexp...

Regular expression to search and replace a string in a file

regex,string,r,stringi

I think this may be what you're looking for. > txt <- "automatically got activated,may be we download,network services,food quality is excellent" A made-up vector of sentences to search from: > searchList <- c('This is a sentence that automatically got activated', 'may be we download some music tonight', 'I work...

How to install stringi library from archive and install the local icu52l.zip

r,ubuntu,icu,stringi

From INSTALL file: The stringi package depends on the ICU4C >= 50 library. So libicu42 is far to old. If you check install.R file you'll find following lines: mirrors <- c("http://static.rexamine.com/packages/", "http://www.mini.pw.edu.pl/~gagolews/stringi/", "http://www.ibspan.waw.pl/~gagolews/stringi/") A couple of lines later you'll find something like this: if (!grepl("^https?://", href)) { # try to...

Replace parts of string using package stringi (regex)

regex,r,string,stringi

Not regex but astrsplit and rle with some paste magic: string <- c("abbccc", "bbaccc", "uffff", "aaabccccddd") sapply(lapply(strsplit(string, ""), rle), function(x) { paste(x[[2]], ifelse(x[[1]] == 1, "", x[[1]]), sep="", collapse="") }) ## [1] "ab2c3" "b2ac3" "uf4" "a3bc4d3" ...