FACTOID # 28: Mexico has the most Jehovah's Witnesses per capita in the OECD.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Pattern matching

In computer science, pattern matching is the act of checking for the presence of the constituents of a given pattern. In contrast to pattern recognition, the pattern is rigidly specified. Such a pattern concerns conventionally either sequences or tree structures. Pattern matching is used to test whether things have a desired structure, to find relevant structure, to retrieve the aligning parts, and to substitute the matching part with something else. Sequence (or specifically text string) patterns are often described using regular expressions (i.e. backtracking) and matched using respective algorithms. Sequences can also be seen as trees branching for each element into the respective element and the rest of the sequence, or as trees that immediately branch into all elements. Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. ... A pattern is a form, template, or model (or, more abstractly, a set of rules) which can be used to make or to generate things or parts of a thing, especially if the things that are generated have enough in common for the underlying pattern to be inferred or discerned... Pattern recognition is a field within the area of machine learning. ... In computer programming and formal language theory, (and other branches of mathematics), a string is an ordered sequence of symbols. ... A tree structure is a way of representing the hierarchical nature of a structure in a graphical form. ... In computing, a regular expression is a string that is used to describe or match a set of strings, according to certain syntax rules. ... Backtracking is a type of algorithm that is a refinement of brute force search. ...


Tree patterns can be used in programming languages as a general tool to process data based on its structure. Some functional programming languages such as Haskell, ML and the symbolic mathematics language Mathematica have a special syntax for expressing tree patterns and a language construct for conditional execution and value retrieval based on it. For simplicity and efficiency reasons, these tree patterns lack some features that are available in regular expressions. Depending on the languages, pattern matching can be used for function arguments, in case expressions, whenever new variables are bound, or in very limited situations such as only for sequences in assignment (in Python). Often it is possible to give alternative patterns that are tried one by one, which yields a powerful conditional programming construct. Pattern matching can benefit from guards. A programming language is an artificial language that can be used to control the behavior of a machine, particularly a computer. ... Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions. ... Haskell is a standardized purely functional programming language with non-strict semantics, named after the logician Haskell Curry. ... ML is a general-purpose functional programming language developed by Robin Milner and others in the late 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM. Historically, ML stands for metalanguage as it was conceived to develop proof tactics in the LCF theorem prover (the language of... Mathematica is a general computing environment, organizing many algorithmic, visualization, and user interface capabilities within a document-like user interface paradigm. ... In computer science, conditional statements are a vital part of a programming language. ... Python is a high-level programming language first released by Guido van Rossum in 1991. ... In computer science, conditional statements are a vital part of a programming language. ... In computer programming, a guard is a boolean expression that must evaluate to true if the program execution is to continue in the branch in question. ...


Term rewriting languages rely on pattern matching for the fundamental way a program evaluates into a result. Pattern matching benefits most when the underlying datastructures are as simple and flexible as possible. This is especially the case in languages with a strong symbolic flavor. In symbolic programming languages, patterns are the same kind of datatype as everything else, and can therefore be fed in as arguments to functions. Rewriting in mathematics, computer science and logic covers a wide range of methods of transforming strings, written in some fixed alphabet, that are not deterministic but are governed by explicit rules. ...

Contents

Primitive patterns

The simplest pattern in pattern matching is an explicit value or a variable. For an example, consider a simple function definition in Haskell syntax (function parameters are not in parentheses but are separated by spaces, = is not assignment but definition):

 f 0 = 1 

Here, 0 is a single value pattern. Now, whenever f is given 0 as argument the pattern matches and the function returns 1. With any other argument, the matching and thus the function fail. As the syntax supports alternative patterns in function definitions, we can continue the definition extending it to take more generic arguments:

 f n = n * f (n-1) 

Here, the first n is a single variable pattern, which will match absolutely any argument and bind it to name n to be used in the rest of the definition. In Haskell (unlike at least Hope), patterns are tried in order so the first definition still applies in the very specific case of the input being 0, while for any other argument the function returns n * f (n-1) with n being the argument. Hope is a small functional programming language, with strong polymorphic typing, algebraic types, pattern matching, and higher-order functions. ...


The wildcard pattern (often written as _) is also simple: like a variable name, it matches any value, but does not bind the value to any name.


Tree patterns

Wikibooks
Wikibooks Haskell has a page on the topic of

More complex patterns can be built from the primitive ones of the previous section, usually in the same way as values are built by combining other values. The difference then is that with variable and wildcard parts, a pattern doesn't build into single value, but matches a group of values that are the combination of the concrete elements and the elements that are allowed to vary within the structure of the pattern. Image File history File links Wikibooks-logo-en. ... Wikibooks logo Wikibooks, previously called Wikimedia Free Textbook Project and Wikimedia-Textbooks, is a wiki for the creation of books. ...


A tree pattern describes a part of a tree by starting with a node and specifying some branches and nodes and leaving some unspecified with a variable or wildcard pattern. It may help to think of the abstract syntax tree of a programming language and algebraic data types. In computer science, an abstract syntax tree (AST) is a finite, labeled, directed tree, where the internal nodes are labeled by operators, and the leaf nodes represent the operands of the node operators. ... An algebraic data type is a datatype whose each value is data from other datatypes wrapped in one of the constructors of the datatype. ...


In Haskell, the following line defines an algebraic data type Color that has a single data constructor ColorConstructor that wraps an integer and a string.

 data Color = ColorConstructor Integer String 

The constructor is a node in a tree and the integer and string are leaves in branches.


When we want to write functions to make Color an abstract data type, we wish to write functions to interface with the data type, and thus we want to extract some data from the data type, for example, just the string or just the integer part of Color. In computer science, a subroutine (function, procedure, or subprogram) is a sequence of code which performs a specific task, as part of a larger program, and is grouped as one, or more, statement blocks; such code is sometimes collected into software libraries. ... In computing, an abstract data type (ADT) is a specification of a set of data and the set of operations that can be performed on the data. ... An interface defines the communication boundary between two entities, such as a piece of software, a hardware device, or a user. ...


If we pass a variable that is of type Color, how can we get the data out of this variable? For example, for a function to get the integer part of Color, we can use a simple tree pattern and write:

 integerPart (ColorConstructor theInteger _) = theInteger 

As well:

 stringPart (ColorConstructor _ theString) = theString 

The creations of these functions can be automated by Haskell's data record syntax.


Filtering data with patterns

Pattern matching can be used to filter data of a certain structure. For instance, in Haskell a list comprehension could be used for this kind of filtering:

 [A x | A x <- [A 1, B 1, A 2, B 2]] 

evaluates to

 [A 1, A 2] 

Pattern matching in Mathematica

In Mathematica, the only structure that exists is the tree, which is populated by symbols. In the Haskell syntax used thus far, this could be defined as Mathematica is a general computing environment, organizing many algorithmic, visualization, and user interface capabilities within a document-like user interface paradigm. ... A simple example unordered tree In computer science, a tree is a widely-used data structure that emulates a tree structure with a set of linked nodes. ... Haskell is a standardized purely functional programming language with non-strict semantics, named after the logician Haskell Curry. ...

 data SymbolTree = Symbol String [Symbol] 

An example tree could then look like

 Symbol "a" [Symbol "b" [], Symbol "c" []] 

In the traditional, more suitable syntax, the symbols are written as they are and the levels of the tree are represented using [], so that for instance a[b,c] is a tree with a as the parent, and b and c as the children.


A pattern in Mathematica involves putting "_" at positions in that tree. For instance, the pattern

 A[_] 

Will match elements such as A[1], A[2], or more generally A[x] where x is any entity. In this case, A is the concrete element, while _ denotes the piece of tree that can be varied. A symbol prepended to _ binds the match to that variable name while a symbol appended to _ restricts the matches to nodes of that symbol.


The Mathematica function Cases filters elements of the first argument that match the pattern in the second argument:

 Cases[{a[1], b[1], a[2], b[2]}, a[_] ] 

evaluates to

 {a[1], a[2]} 


Pattern matching applies to the structure of expressions. In the example below,

 Cases[{a[b], a[b,c], a[b[c], d], a[b[c], d[e]], a[b[c], d, e]}, a[b[_],_]] 

returns

 {a[b[c],d], a[b[c],d[e]]} 

because only these elements will match the pattern a[b[_],_] above.


In Mathematica, it is also possible to extract structures as they are created in the course of computation, regardless of how or where they appear. The function Trace can be used to monitor a computation, and return the elements that arise which match a pattern. For example, we can define the Fibonacci sequence as A tiling with squares whose sides are successive Fibonacci numbers in length A Fibonacci spiral, created by drawing arcs connecting the opposite corners of squares in the Fibonacci tiling shown above – see golden spiral. ...

 fib[0|1]:=1 fib[n_]:= fib[n-1] + fib[n-2] 

Then, we can ask the question: Given fib[3], what is the sequence of recursive Fibonacci calls?

 Trace[fib[3], fib[_]] 

returns a structure that represents the occurrences of the pattern fib[_] in the computational structure:

 {fib[3],{fib[2],{fib[1]},{fib[0]}},{fib[1]}} 

Declarative programming

In symbolic programming languages, it is easy to have patterns as arguments to functions or as elements of data structures. A consequence of this is the ability to use patterns to declaratively make statements about pieces of data and to flexibly instruct functions how to operate.


For instance, the Mathematica function Compile can be used to make more efficient versions of the code. In the following example the details do not particularly matter; what matters is that the subexpression {{com[_], _Integer}} instructs Compile that expressions of the form com[_] can be assumed to be integers for the purposes of compilation: Mathematica is a general computing environment, organizing many algorithmic, visualization, and user interface capabilities within a document-like user interface paradigm. ... The integers are commonly denoted by the above symbol. ...

 com[i_] := Binomial[2i, i] Compile[{x, {i, _Integer}}, x^com[i], {{com[_], _Integer}}] 

Mailboxes in Erlang also work this way. Erlang is a general-purpose concurrent programming language and runtime system. ...


Pattern matching and strings

By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.


However, it is possible to perform some string pattern matching within the same framework that has been discussed throughout this article.


Tree patterns for strings

In Mathematica, strings are represented as trees of root StringExpression and all the characters in order as children of the root. Thus, to match "any amount of trailing characters", a new wildcard ___ is needed in contrast to _ that would match only a single character.


In Haskell and functional programming languages in general, strings are represented as functional lists of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax: Look up list in Wiktionary, the free dictionary. ...

 [] -- an empty list x:xs -- an element x constructed on a list xs 

The structure for a list with some elements is thus element:list. When pattern matching, we assert that a certain piece of data is equal to a certain pattern. For example, in the function:

 head (element:list) = element 

we assert that the first element of head's argument is called element, and the function returns this. We know that this is the first element because of the way lists are defined, a single element constructed onto a list. This single element must be the first. The empty list would not match the pattern at all, as an empty list does not have a head (the first element that is constructed).


In the example, we have no use for list, so we can disregard it, and thus write the function:

 head (element:_) = element 

The equivalent Mathematica transformation is expressed as

 head[element_, ___]:=element 

Example string patterns

In Mathematica, for instance,

 StringExpression["a", _] 

will match a string that has two characters and begins with "a".


The same pattern in Haskell:

 ['a', _] 

Symbolic entities can be introduced to represent many different classes of relevant features of a string. For instance,

 StringExpression[LetterCharacter, DigitCharacter] 

will match a string that consists of a letter first, and then a number.


In Haskell, guards could be used to achieve the same matches: In computer programming, a guard is a boolean expression that must evaluate to true if the program execution is to continue in the branch in question. ...

 [letter, digit] | isAlpha letter && isDigit digit 

The main advantage of symbolic string manipulation is that it can be completely integrated with the rest of the programming language, rather than being a separate, special purpose subunit. The entire power of the language can be leveraged to built up the patterns themselves or analyze and transform the programs that contain them.


History

The first computer programs to use pattern matching were text editors. At Bell Labs, Ken Thompson extended the seeking and replacing features of the QED editor to accept regular expressions. Early programming languages with pattern matching constructs include SNOBOL from 1962, NPL from 1977, and KRC from 1981. The first programming language with tree-based pattern matching features was Fred McBride's extension of LISP, in 1970.[1] Bell Laboratories (also known as Bell Labs and formerly known as AT&T Bell Laboratories and Bell Telephone Laboratories) was the main research and development arm of the United States Bell System. ... Ken Thompson Kenneth Thompson (born February 4, 1943) is a pioneer of computer science notable for his contributions to the development of the C programming language and the UNIX operating system. ... QED is a line-oriented computer text editor. ... In computing, a regular expression is a string that is used to describe or match a set of strings, according to certain syntax rules. ... SNOBOL (StriNg Oriented symBOlic Language) is a computer programming language developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky. ... NPL (possibly for New Programming Language) was the original name given to what would later become IBMs PL/I programming language. ... KRC (Kent Recursive Calculator) is a lazy functional language developed by David Turner in 1981 based on SASL, with pattern matching and ZF expressions. ...

See also: Regular expression#History

In computing, a regular expression is a string that is used to describe or match a set of strings, according to certain syntax rules. ...

SNOBOL

Main article: SNOBOL

SNOBOL (StriNg Oriented symBOlic Language) is a computer programming language developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky. ...

See also

  • Pattern recognition for fuzzy patterns
  • Category:Pattern matching for articles about string pattern matching
  • AIML for an AI language based on matching patterns in speech
  • SNOBOL for a programming language based on one kind of pattern matching
  • PCRE Perl Compatible Regular Expressions, a common modern implementation of string pattern matching ported to many languages
  • Unification, a similar concept in Prolog.
  • Tom (pattern matching language)

Pattern recognition is a field within the area of machine learning. ... AIML, or Artificial Intelligence Markup Language, is an XML dialect for creating natural language software agents. ... SNOBOL (StriNg Oriented symBOlic Language) is a computer programming language developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky. ... PCRE stands for Perl Compatible Regular Expressions. ... In mathematical logic, in particular as applied to computer science, a unification of two terms is a join (in the lattice sense) with respect to a specialisation order. ... Prolog is a logic programming language. ... Tom is a programming language particularly well-suited for programming various transformations on trees/terms and XML based documents. ...

References

  • Pattern matching in The Free On-line Dictionary of Computing, Editor Denis Howe.
  • Python Reference Manual, chapter 6.3 Assignment statements.

External links


  Results from FactBites:
 
Bash Reference Manual: Pattern Matching (289 words)
may be matched by including it as the first or last character in the set.
Matches zero or one occurrence of the given patterns.
Matches anything except one of the given patterns.
Pattern Matching for Construct Validity (1631 words)
Pattern matching always involves an attempt to link two patterns where one is a theoretical pattern and the other is an observed or operational one.
The major differences between pattern matching and more traditional hypothesis testing approaches are that pattern matching encourages the use of more complex or detailed hypotheses and treats the observations from a multivariate rather than a univariate perspective.
While pattern matching can be used to address a variety of questions in social research, the emphasis here is on its use in assessing construct validity.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.