The Artima Developer Community
Sponsored Link

Programming in Scala Forum
Combinator Parsing Problem Revisited

1 reply on 1 page. Most recent reply: Apr 21, 2008 3:28 PM by Antony Stubbs

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 1 reply on 1 page
Partha Biswas

Posts: 3
Nickname: parthab
Registered: Jul, 2007

Combinator Parsing Problem Revisited Posted: Apr 16, 2008 12:14 AM
Reply to this message Reply
Advertisement
Dear Everybody,
I was trying to parse various strings. One rule tells that the String must contain a smaller string like HGSDC:MSISDN=919810001671,SUD=OICK7-0; which can be described by a Java regular expression "HGS?[CEI]*".

I have compiled a code like following :


import scala.util.parsing.combinator._
import scala.util.parsing.combinator.syntactical._
import java.util.regex.{Pattern, Matcher}

object Test extends StandardTokenParsers with Application {
 
  lexical.delimiters ++= List(",", "?", "[", "]", ":", "=", ";", "-")
 
  def expr: Parser[Any] = opt(startStop) ~> rep(member) <~ opt(startStop)
 
  def startStop = "[" | "]"
 
  def member: Parser[Any] = opt(numericLit) ~ ident ~ rep(opt(",") ~ opt(":") ~ ident ~ opt("=") ~ opt("-") ~ opt(numericLit) ~ opt(";") <~ opt("?")) 
 
  val lSTr = List("[ Hello,Hi How Do U D oo ?]", 
"60 HGSSE:MSISDN=919810637917,SS=BAIC,BSG=TS10;", 
"53 HGSDC:MSISDN=919810001671,SUD=OICK7-0;", 
" XGSDW")
 
  val pat = Pattern.compile("HGS?[CEI]*")
 
  for (str <- lSTr) {
    if(pat.matcher(str).find()) {
        val tokens = new lexical.Scanner(str)
        println("input: " + str)
        val result = (phrase(expr)(tokens))
        println(result)
    }
  }
}


The output looks like :

input: 60 HGSSE:MSISDN=919810637917,SS=BAIC,BSG=TS10;

[1.55] parsed: List(((Some(60)~HGSSE)~List(((((((None~Some(:))~MSISDN)~Some(=))~None)~Some(919 810637917))~None), ((((((Some(,)~None)~SS)~Some(=))~None)~None)~None), ((((((None~None)~BAIC)~None)~None)~None)~None), ((((((Some(,)~None)~BSG)~Some(=))~None)~None)~None), ((((((None~None)~TS10)~None)~None)~None)~Some(;)))))

input: 53 HGSDC:MSISDN=919810001671,SUD=OICK7-0;

[1.49] parsed: List(((Some(53)~HGSDC)~List(((((((None~Some(:))~MSISDN)~Some(=))~None)~Some(919 810001671))~None), ((((((Some(,)~None)~SUD)~Some(=))~None)~None)~None), ((((((None~None)~OICK7)~None)~Some(-))~Some(0))~Some(;)))))



Requirements :
The String may or may not start with numeric literal. I want the output as a Array[String] or List[String] or a tuple. Also the elements should be as it is in the input, white spaces being the delimiter, like :
"60","HGSSE:MSISDN=919810637917,SS=BAIC,BSG=TS10;"


Firstly, how to use combinator.ImplicitConversions with ParseResult ( created by phrase(expr) ) ? Or, is there any other way ?

Also I am not comfortable using ":", "=", ";", "-" as lexical.delimiters, when they are really not delimiters in my case. I want to write 'member' cleanly. What I have built here ( the 'member' ) is on a trial and error basis. Could you please rectify that ? How could I use a Chainl1 , if required. Please help at the earliest. I am a neophyte in Scala. I have requirement to parse a little different kind of String also.

I have several different strings ( a line generated out of Telecomm Switch ). How to parse different types of lines are provided in a document. Actually files are provided as input. Files may contain ASCII data ( Currently I am considering ), Binary data or ASN type. Of course a file may contain only one type of data. Now there can be 50 different types of ASCII files for a single current location, giving rise to say 70 different rules to parse those files. Like one rule is 'line to be parsed must contain a regex as 'HGS?[CEI]*'. The delimiters can vary. More than one type of delimiters can be there etc. Another rule is say for a file containinig a line like "xxxxx[1-&&5] yzx cvbn" should be parsed and eventually produce 5 lines like "xxxxx 1 yzx cvbn", "xxxxx 2 yzx cvbn", "xxxxx 3 yzx cvbn", "xxxxx 4 yzx cvbn", "xxxxx 5 yzx cvbn". Existing Java based solution is to write 50 different Java programs ( called Parsers ) where the guiding document is directly translated into ifs and nested ifs in a rigid monolithic code. I want to change this providing abstraction by creating a chain of functions ( actually methods in Java ) available from a pre-built function library. The library grows over the passage of time and if it is meager to cater to parse a particular file, a new function will be created and added to the library. Creating framework for appropriate chain of function is a design aspect ( a combination of Visitor, CoR, Decorator, Adapter Object etc ) and is a separate thing. A Java method might look like 'func_1(String _fieldType, short _fieldLength, short _columnPosition, String ConvertedType, short _outputPosition) etc. However, a non-technical person through a UI is supposed to choose functions to form chain of functions that can appropriately cater to the parsing requirement of a file. And, the basic minimum structure that the non-tech person will have to understand is what a function can do. Moreover, the functions written by various programmers may not follow standard. Scala comes to play here. Since, Scala has scala.util.parsing.combinator package, where Parsers and Parsers.Parser have several functions which can be chosen to form a Parser ( like 'member' and 'expr' in my code.

So the whole workaround to form a library of functions and the framework to form appropriate chain of functions for a particular file gets reduced to a one liner like 'def member: Parser[Any] = .....' in my code.

Thanks in anticipation and with earnest regards ~
Partha Biswas


Antony Stubbs

Posts: 30
Nickname: astubbs
Registered: Feb, 2008

Re: Combinator Parsing Problem Revisited Posted: Apr 21, 2008 3:28 PM
Reply to this message Reply
These questions have been answered on the scala mailing list:
http://www.nabble.com/-scala---Scala-Combinator-Parsing-Problem-td16720562.html

Flat View: This topic has 1 reply on 1 page
Topic: misspelling Previous Topic   Next Topic Topic: Combinator Parsing Problem

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use