The focus of this endeavour to figure out how to use Haskell now turns to work of the author of 'regex-base', 'regex-pcre', and 'regex-tdfa'. Unfortunately for Haskell noobs like me, most of his documentation for these packages assumes that the reader is proficient in reading abstract (read: vague) hints for manipulating Haskell's type system. There are no algorithms provided on an "it just works, with a low POSSIBLE* variance of interpretation," basis.
(* in the modal logical sense, sigh *sic*.)
I'm really trying to figure this out - after all, it seems like he's set up quite a robust system. Below, expect plenty of redundancy without similar Q&A on SO, and other sites. Hopefully by the time I'm done with this exploration, we'll be left with some sort of canonical summary for dummies (God forbid the APIs then change again...).
Of special interest are the manipulation of ByteString types, as such manipulations are obviously much faster than String manipulations (I'm assuming you know the difference between these types, in Haskell).
Let me begin by listing all the relevant resources I've run into over the past week of this.
- relevant page from Real World Haskell
- relevant answer (with link to a sibling answer) on SO
- latest docs for 'regex-base (Text.Regex.Base)', Text.Regex.Base.RegexLike, and Text.Regex.Base.Context latest docs for 'regex-pcre (Text.Regex.PCRE)', Text.Regex.PCRE.ByteString, and Text.Regex.PCRE.Wrap
- latest docs for 'regex-tdfa (Text.Regex.TDFA.)', and Text.Regex.TDFA.ByteString
- WIP
Well, here's a basic working example with Text.Regex.TDFA and String:
Here's a similar example with Text.Regex.PCRE and String:import Text.Regex.TDFA temp = getAllTextMatches ("foo" =~ "o" :: AllTextMatches [] String)
And, adding one module and changing a type hint in each, lets us use ByteString. Here's the TDFA example:import Text.Regex.PCRE temp = getAllTextMatches ("foo" =~ "o" :: AllTextMatches [] String)
Likewise, the PCRE example:import Text.Regex.TDFA import Data.ByteString.Char8 temp = getAllTextMatches ((pack "foo") =~ (pack "o") :: AllTextMatches [] ByteString)
That should get us started. I am going to bed now... this research and writing will continue later.import Text.Regex.PCRE import Data.ByteString.Char8 temp = getAllTextMatches ((pack "foo") =~ (pack "o") :: AllTextMatches [] ByteString)
2012-03-08:
Tear down of AllMatchText usage
.. specifically, along with ByteString and PCRE.It turns out that as suspected, there's just too much going on in K's giant type signatures. This is complexified by the use of the Array type - which is morphologically represented by round and square brackets, as if it were composed only of tuples and lists, while being subject to further semantic conventions that require a reading of the documentation of (Array). (Actually, their implemented completely differently from mere ordinary tuples and lists.) Furthermore, K exports all sorts of utility variations for formatting the output of each function... lists-of-arrays, arrays-of-arrays, arrays-of-lists, etc. all representing the same data in different structures. Very muddy. Nevertheless, I guess he's done a good deed by writing the general libraries for all of us.import Text.Regex.PCRE import Data.Array import Data.ByteString.Char8 main = return $ -- all expressions returned by the functions below are (ByteString)s {- getAllTextMatches ((pack "abcdebxcfgfbycijk") =~ (pack "(b).*?(c)") :: AllTextMatches (Array Int) (Array Int ByteString)) -- An Array of: -- Arrays, -- containing all matched expressions, -- and their matched subexpressions -} {- getAllTextMatches ((pack "abcdebxcfgfbycijk") =~ (pack "(b).*?(c)") :: AllTextMatches [] (Array Int ByteString)) -- A List of: -- Arrays of: -- matched expressions, -- and their matched subexpressions -} {- getAllTextMatches ((pack "abcdebxcfgfbycijk") =~ (pack "(b).*?(c)") :: AllTextMatches (Array Int) [ByteString]) -- An Array of: -- Lists of: -- matched expressions, -- and their matched subexpressions -} {- getAllTextMatches ((pack "abcdebxcfgfbycijk") =~ (pack "(b).*?(c)") :: AllTextMatches (Array Int) ByteString) -- An Array of: -- matched expressions -} {- getAllTextMatches ((pack "abcdebxcfgfbycijk") =~ (pack "b.*?c") :: AllTextMatches [] ByteString) -- A List of: -- matched expressions -} --{- getAllTextMatches ((pack "abcdebxcfgfbycijk") =~ (pack "(b).*?(c)") :: AllTextMatches (Array Int) (MatchText ByteString)) -- An Array of: -- (MatchText)s, i.e. Arrays of: -- matched expressions, -- with their (MatchOffset)s -- and their (MatchLength)s -- and their matched subexpressions -- with their (MatchOffset)s -- and their (MatchLength)s -}
And then you've got types (MatchArray) and (MatchText) which are woefully, arbitarily named, despite their underlying simplicity and similarity.
The class (Extract) in Text.Regex.Base.RegexLike, really should be exposed at the same layer as the matching functions. :( I'm thinking that it should belong in some (.Internals or .Utilities) module, instead.
2012-03-11
Done. Figured out the Kuklewicz code, at least at the level of using his utility functions. Will have to tidy this post up later, if ever at all.WIP
WIP
WIP
WIP
No comments :
Post a Comment