Menu
  • HOME
  • TAGS

How to define a Regex in StandardTokenParsers to identify path?

Tag: regex,scala,parsing,lexical-analysis

I am writing a parser in which I want to parse arithmetic expressions like: /hdfs://xxx.xx.xx.x:xxxx/path1/file1.jpg+1 I want to parse it change the infix to postfix and do the calculation. I used helps from a part of code in another discussion as well.

 class InfixToPostfix extends StandardTokenParsers {
 import lexical._

 def regexStringLit(r: Regex): Parser[String] = acceptMatch(
 "string literal matching regex " + r,
 { case  StringLit(s)  if r.unapplySeq(s).isDefined => s })
 def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
 lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
 def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
def variable:Parser[Expr] =  pathIdent ^^ { s => Variable(s) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"

def argument:Parser[Expr] = expr <~ (","?)
def func:Parser[Expr] = ( pathIdent ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })

def term = (value | parens | func | variable)

// Needed to define recursive because ^ is right-associative
def pow :Parser[Expr] = ( term ~ "^" ~ pow ^^ {case left ~ _ ~ right => BinaryOperator(left, "^", right) }|
            term)
def factor = pow * ("*" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "*", right) } |
                    "/" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "/", right) } )
def sum =  factor * ("+" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "+", right) } |
                    "-" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "-", right) } )
def expr = ( sum | term )

def parse(s:String) = {

   val tokens = new lexical.Scanner(s)
    phrase(expr)(tokens)
}

//and the rest of the code

I was able to solve the following errors with the help of this answer:

      ScalaParser.scala:192: invalid escape character
  [error]     def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
  [error]                                                               ^
  [error] ScalaParser.scala:192: invalid escape character
  [error]     def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
   [error]                                                                ^
   [error] ScalaParser.scala:192: invalid escape character
   [error]     def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
   [error]                                                                        ^

With the change of pathIdent to this:

  def pathIdent: Parser[String] =regexStringLit("/hdfs://([\\d.]+):(\\d+)/([\\w/]+/(\\w+\\.w+))".r)

Now I am getting a run time error which says:

 [1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected

/hdfs://111.33.55.2:8888/folder1/p.a3d+1
^

It was working using JavaTokenParsers but with current changes and I had to use StandardTokenParsers.

Best How To :

In a double quoted string backslash is an escape character. If you mean to use the literal backslash in a double quotes string you must escape it, thus "\d" should be "\\d".

Furthermore you do not need to escape the regex dot within a character class, since dot has no special meaning with a character class. So "[\d.]" should just be "[\d.]".

You can also forgo all this escaping business by using the raw interpolator or multi-line string literals using triple quotes.

Spray microservice assembly deduplicate

scala,sbt,akka,spray,microservices

The issue as it seems transitive dependency of the dependency is resulting with two different versions of metrics-core. The best thing to do would be to used the right library dependency so that you end up with a single version of this library. Please use https://github.com/jrudolph/sbt-dependency-graph , if it is...

Finding embeded xpaths in a String

java,regex

Use {} instead of () because {} are not used in XPath expressions and therefore you will not have confusions.

Python regular expression, matching the last word

python,regex,list

Use the alternation with $: import re mystr = 'HelloWorldToYou' pat = re.compile(r'([A-Z][a-z]*)') # or your version with `.*?`: pat = re.compile(r'([A-Z].*?)(?=[A-Z]+|$)') print pat.findall(mystr) See IDEONE demo Output: ['Hello', 'World', 'To', 'You'] Regex explanation: ([A-Z][a-z]*) - A capturing group that matches [A-Z] a capital English letter followed by [a-z]* -...

Identify that a string could be a datetime object

python,regex,algorithm,python-2.7,datetime

What about fuzzyparsers: Sample inputs: jan 12, 2003 jan 5 2004-3-5 +34 -- 34 days in the future (relative to todays date) -4 -- 4 days in the past (relative to todays date) Example usage: >>> from fuzzyparsers import parse_date >>> parse_date('jun 17 2010') # my youngest son's birthday datetime.date(2010,...

How to create the javascript regular expression for number with some special symbols

javascript,regex

This matches all given examples as well: ^\$?\d+(?:[.,:]\d+)?%?$ See it in action: RegEx101 Please comment, if adjustment / further detail is required....

Please can someone help me understand the exec method for regular expressions?

javascript,regex

I don't understand why it would give me two hellos back? Because the first entry in the array is the overall match for the expression, which is then followed by the content of any capture groups the expression defines. Since the expression defines one capture group, you get back...

Java - Enforce TextField Format - UX - 00:00:00;00

java,regex,user-interface

How about using JFormattedTextField with MaskFormatter. JFormattedTextField formattedTextField = new JFormattedTextField("00:00:00;00"); try { MaskFormatter maskFormatter = new MaskFormatter("##:##:##;##"); maskFormatter.install(formattedTextField); } catch (ParseException e) { e.printStackTrace(); } More info at http://docs.oracle.com/javase/tutorial/uiswing/components/formattedtextfield.html Demo code: JFrame frame = new JFrame(""); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); JPanel panel = new JPanel(); JFormattedTextField...

How to Match a string with the format: “20959WC-01” in php?

php,regex

$pattern = '! ^ # start of string \d{5} # five digits [[:alpha:]]{2} # followed by two letters - # followed by a dash \d{2} # followed by two digits $ # end of string !x'; $matches = preg_match($pattern, $input); ...

jQuery / Regex: How to compare string against several substrings

jquery,regex,string,substring,substr

You could convert this to a slightly more maintainable format, without getting into regular expressions. This is one way to use an array to accomplish your goal: // Super-quick one-liner: var str = '2042038423408'; var matchCount = $.grep(['12', '23', '34', '45', '56', '67', '78', '89', '90', '01'], function(num, i) {...

How to write RegEx for inserting line break for line length more than 30 characters?

regex

Find what: ^(.{30}) Replace with: \1\n ...

How to effectively get indices of 1s for given binary string using Scala?

scala,functional-programming,higher-order-functions

You can use a filter and then map to get the index : scala> val s = "10010010" s: String = 10010010 scala> s.zipWithIndex.withFilter(_._1 == '1').map(_._2) res0: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 3, 6) Note: I'm using withFilter and not filter to avoid creating a temporary collection. Or you can use collect,...

Warning: preg_match_all(): Unknown modifier '\' [duplicate]

php,regex,warnings

Use a different set of delimiters for the regex. For example, you can write preg_match_all('~[^/\s]+/\S+\.(jpg|png|gif)~', $string, $results ...

MySQL substring match using regular expression; substring contain 'man' not 'woman'

mysql,regex

A variant of n-dru pattern since you don't need to describe all the string: SELECT '#hellowomanclothing' REGEXP '(^#.|[^o]|[^w]o)man'; Note: if a tag contains 'man' and 'woman' this pattern will return 1. If you don't want that Gordon Linoff solution is what you are looking for....

Regex not working in HTML5 pattern

regex,html5

The pattern attribute has to match the entire string. Assertions check for a match, but do not count towards the total match length. Changing the second assertion to \w+ will make the pattern match the entire string. You can also skip the implied ^, leaving you with just: <input pattern="(?!34)\w+"...

Validate part of mail suffix

c#,regex

You can use this regex to test. It will ensure that after the @ there is .xx. but may also match the string @.xx.* .*@[^.]*[.]xx[.] Or this one to ensure that there is at least one character before and after the @. [email protected][^.]+[.]xx[.] ...

Reg ex matching a word

regex

You could use a negative lookahead which will exclude those having _FX following the initial alpha string ^ABD_DEF_GHIJ(?!_FX)(?:_\d{8})?$ see example here...

How to instantiate lexical.Scanner in a JavaTokenParsers class?

scala,parsing,lexical-scanner

The JavaTokenParsers does not implement the Scanners trait. So you would need to extends also from this trait (or a trait that extends it) in order to have access to this class. Unless your expr parser accepts the Reader as a parameter (not from its apply method), you'd need to...

Regex to remove `.` from a sub-string enclosed in square brackets

c#,.net,regex,string,replace

To remove all the dots present inside the square brackets. Regex.Replace(str, @"\.(?=[^\[\]]*\])", ""); DEMO To remove dot or ?. Regex.Replace(str, @"[.?](?=[^\[\]]*\])", ""); ...

Does there exist an algorithm for iterating through all strings that conform to a particular regex?

c#,regex,algorithm

Let's say the domain is as following String domain[] = { a, b, .., z, A, B, .. Z, 0, 1, 2, .. 9 }; Let's say the password size is 8 ArrayList allCombinations = getAllPossibleStrings2(domain,8); This is going to generate SIZE(domain) * LENGTH number of combinations, which is in...

PHP Regular Expressions Counting starting consonants in a string

php,regex

This is one way to do it, using preg_match: $string ="SomeStringExample"; preg_match('/^[b-df-hj-np-tv-z]*/i', $string, $matches); $count = strlen($matches[0]); The regular expression matches zero or more (*) case-insensitive (/i) consonants [b-df-hj-np-tv-z] at the beginning (^) of the string and stores the matched content in the $matches array. Then it's just a matter...

Extracting strings from HTML with Python wont work with regex or BeautifulSoup

python,regex,parsing,beautifulsoup,python-requests

In order to match the string with a literal backlash, you need to double-escape it in a raw string, e.g.: re.search(r'@CAD_DTA\\">(.+?)@[email protected]@CAD_LBL',result.text) ^ ^ In order to get the index of the found match, you can use start([group]) of re.MatchObject IDEONE demo: import re obj = re.search(r'@CAD_DTA\\">(.+?)@[email protected]@CAD_LBL', 'Some text [email protected]_DTA\\">I WANT...

Scala unapplySeq extractor syntax

scala,pattern-matching,scala-2.11

The equivalent non-infix version is: xs match { case List(x, _, _) => "yes" case _ => "no" } Scala specification says: An infix operation pattern p;op;q is a shorthand for the constructor or extractor pattern op(p,q). The precedence and associativity of operators in patterns is the same as in...

Swing regular expression for phone number validation

java,regex

To only allow digits, comma and spaces, you need to remove (, ) and -. Here is a way to do it with Matcher.find(): Pattern pattern = Pattern.compile("^[0-9, ]+$"); ... if (!m.find()) { evt.consume(); } And to allow an empty string, replace + with *: Pattern pattern = Pattern.compile("^[0-9, ]*$");...

Regex pass dynamic values with boundry

c#,regex,string,boundary

Your first regular expression has a black slash followed by the letter b because of that @. The second one has the character that represents backspace. Just put an @ in front string bound = @"\b"; ...

Regular Expression for whole world

regex,c#-4.0,vb6

You can use: Public\s+Const\s+g(?<Name>[a-zA-Z][a-zA-Z0-9]*)\s+=\s+(?<Value>False|True) demo ...

Scala running issue on eclipse

eclipse,scala

to run as scala application, you need to create Scala App and not class In eclipse, package explorer select project/src/package right click new>scala app inform Name e.g. Test and click "finish" select Test.scala right click "run as Scala Application" see results in console window....

Retrieving TriangleCount

scala,apache-spark,spark-graphx

triangleCount counts number of triangles per vertex and returns Graph[Int,Int], so you have to extract vertices: scala> graph.triangleCount().vertices.collect() res0: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((1,1), (3,1), (2,1)) ...

How many characters are visible like a space, but are not space characters?

php,regex

You can make use of a Unicode category \p{Zs}: Zs    Space separator $string = preg_replace('~\p{Zs}~u', ' ', $string); The \p{Zs} Unicode category class will match these space-like symbols: Character Name U+0020 SPACE U+00A0 NO-BREAK SPACE U+1680 OGHAM SPACE MARK U+2000 EN QUAD U+2001 EM QUAD U+2002 EN SPACE U+2003 EM SPACE...

Regex that allow void fractional part of number

c#,regex

Just get the dot outside of the captruing group and then make it as optional. @"[+-]?\d+\.?\d*" Use anchors if necessary. @"^[+-]?\d+\.?\d*$" ...

Zipping two arrays together with index in Scala?

arrays,scala,zip

Simply do: array1.zip(array2).zipWithIndex.map { case ((a, b), i) => (a, b, i) } ...

match line break except line begin with spcific word or blank line

regex,notepad++

Try this regex: (?<=[a-zA-Z])(\n) I used parentheses to capture the newline character. https://regex101.com/r/zS9pB4/3...

ZipList with Scalaz

list,scala,scalaz,applicative

pure for zip lists repeats the value forever, so it's not possible to define a zippy applicative instance for Scala's List (or for anything like lists). Scalaz does provide a Zip tag for Stream and the appropriate zippy applicative instance, but as far as I know it's still pretty broken....

Get all prices with $ from string into an array in Javascript

javascript,regex,currency

It’s quite trivial: RegEx string.match(/\$((?:\d|\,)*\.?\d+)/g) || [] That || [] is for no matches: it gives an empty array rather than null. Matches $99 $.99 $9.99 $9,999 $9,999.99 Explanation / # Start RegEx \$ # $ (dollar sign) ( # Capturing group (this is what you’re looking for) (?: #...

Regex with whitespaces and preceding zeros

regex,sas

You can use this simplified regex: /^[\s0]*11\s*$/ ...

Python match whole file name, not just extension

python,regex,nsregularexpression

You're not capturing the whole filename in the group. You can also use noncapturing groups with (?:...). .*\.(rom|[0-9]{3})+ # from this (.*\.(?:rom|[0-9]{3})) # to this ...

Store regex pattern as a string in PHP when regex pattern contains both single and double quotes

php,regex

The quotes are an issue but not the issue you are running into when you escape them. Your delimiter is terminating your regex just before the closing a which is giving you the unknown modifier error. It appears you don't have error reporting on though so you aren't seeing that....

Future yielding with flatMap

scala

There's no reason to flatMap in the yield. It should be another line in the for-comprehension. for { a <- fa b <- fb c <- fc d <- f(a, b, c) } yield d I don't think it can get more concise than that....

javascript replace dot (not period) character

javascript,regex,replace

Try using the unicode character code, \u2022, instead: message.replace(/\u2022/, "<br />\u2022"); ...

regex - Match filename with or without extension

regex,logstash-grok

This is about as simple as I can get it: \b\w+\.?\w* See demo...

REGEX python find previous string

python,regex,string

Updated: This will check for the existence of a sentence followed by special characters. It returns false if there are no special characters, and your original sentence is in capture group 1. Updated Regex101 Example r"(.*[\w])([^\w]+)" Alternatively (without a second capture group): Regex101 Example - no second capture group r"(.*[\w])(?:[^\w]+)"...

How do I isolate the text between 2 delimiters on the left and 7 delimiters on the right in Python?

python,regex,string,split

You can use python's built-in csv module to do this. j = next(csv.reader([string])); Now j is each item delimited by a , and will include commas if the value is wrapped in ". See j[2]....

Scala (Slick) HList splitting to case classes

scala,slick

Using the tuple functionality in shapeless you could do: import shapeless._ import syntax.std.tuple._ case class Foo(a: Int, b: String) val hlist = 1 :: "a" :: 2 :: "b" :: HNil Foo.tupled(hlist.take(2).tupled) ...

How to match words in 2 list against another string of words without sub-string matching in Python?

python,regex,string,loops,twitter

Store slangNames and riskNames as sets, split the strings and check if any of the words appear in both sets slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"]) riskNames = set(["enough", "pop", "final", "stress", "trade"]) d = {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"} for...

Convert RDD[Map[String,Double]] to RDD[(String,Double)]

scala,apache-spark,rdd

You can call flatMap with the identity function to 'flatten' the structure of your RDD. rdd.flatMap(identity) ...

Scala first program issue

scala,recursion,case,frequency

Cons operator (::) is an infix operator so if you want to get a type of List[T] and not List[List[T]] then you should write freq(c, y.filter(_ == c),(count(c,y),c)) :: list) ...

SCALA: change the separator in Array

arrays,string,scala,delimiter

Your question is unclear, but I'll take a shot. To go from: val x = Array("a","x,y","b") to "a:x,y:b" You can use mkString: x.mkString(":") ...

Get number from string

regex

Use \d+ to match one or more digits. \b(?:http:\/\/)?(?:www\.)?example\.com\/g\/(\d+)\/\w put http:// and www. inside a capturing or non-caturing group and then make it as optional by adding ? quantifier next to that group. For both http and https, it would be (?:https?:\/\/)? DEMO...

Match a pattern preceded by a specific pattern without using a lookbehind

regex,eclipse,lookahead

A work-around for the lack of variable-length lookbehind is available in situations when your strings have a relatively small fixed upper limit on their length. For example, if you know that strings are at most 100 characters long, you could use {0,100} in place of * or {1,100} in place...

How to generalize the round methods

scala

You could use the Numeric type class def round[T](input: T, scale: Int, f: BigDecimal => T)(implicit n: Numeric[T]): T = { f(BigDecimal(n.toDouble(input)).setScale(scale, RoundingMode.HALF_UP)) } Which can be used as: round(5.525, 2, _.doubleValue) res0: Double = 5.53 round(123456789L, -5, _.longValue) res1: Long = 123500000 Another way might be to create a...

BeautifulSoup: Parsing bad Wordpress HTML

python,html,regex,wordpress,beautifulsoup

At least, you can rely on the tag names and text, navigating the DOM tree horizontally - going sideways. These are all strong, p and span (with id attribute set) tags you are showing. For example, you can get the strong text and get the following sibling: >>> from bs4...