FACTOID # 154: Women make up more than 10% of the prison population in only six countries: Thailand, , Qatar, Paraguay, Costa Rica, and Singapore.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Token (parser)
It has been suggested that this article or section be merged with Tokenizing. (Discuss)


In computing, a token is an categorized block of text, usually consisting of indivisible characters known as lexemes. A lexical analyser initially reads in lexemes and categorizes them according to function, giving them meaning . This is assignment of meaning is known as tokenization. A token can look like anything: English, gibberish symbols, anything; It just needs to be a useful part of the structured text. Wikipedia does not have an article with this exact name. ... Tokenizing is the operation of splitting up a string of characters into a set of tokens. ... Originally, the word computing was synonymous with counting and calculating, and a science that deals with the original sense of computing mathematical calculations. ... Definition A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of words that are the same in basic meaning. ... Lexical analysis is the process of taking an input string of characters (such as the source code of a computer program) and producing a sequence of symbols called lexical tokens, or just tokens, which may be handled more easily by a parser. ... Tokenizing is the operation of splitting up a string of characters into a set of tokens. ...


Consider the following table: (please create a table out of this)

lexeme token
sum IDENT
= ASSIGN_OP
3 NUMBER
+ ADD_OP
2 NUMBER
2 SEMICOLON

Tokens are frequently defined by regular expressions, which are understood by a lexical analyzer such as lex. The lexical analyser reads in a stream of lexemes and categorises them into tokens. This is called "tokenizing." If the lexer finds an invalid token, it will report an error. A regular expression (abbreviated as regexp or regex, with plural forms regexps, regexes, or regexen) is a string that describes or matches a set of strings, according to certain syntax rules. ... Lex is a program that generates lexical analyzers (scanners). Lex is commonly used with the yacc parser generator. ... Tokenizing is the operation of splitting up a string of characters into a set of tokens. ...


Following tokenizing is parsing. From there, the interpreted data may be loaded into data structures, for general use, interpretation, or compiling. In computer science, the word parsing is the process of analyzing an input sequence (read from a file or a keyboard, for example) in order to determine its grammatical structure with respect to a given formal grammar. ... A compiler is a computer program that translates a computer program written in one computer language (called the source language) into an equivalent program written in another computer language (called the output or the target language). ...



Consider a text describing a calculation: "46 - number_of(cows); ". The lexemes here might be: "46", "-", "number_of", "(", "cows", and ")" ';'. The lexical analyser will denote lexemes 4 and 6 as 'number' and - as character, and 'number_of ' as a separate token. Even the lexe ';' in some languages (such as C) has some special meaning.


The whitespace lexemes are sometimes ignored later by the syntax analyser. A token doesn't need to be valid, in order to be recognized as a token. "cows" may be nonsense to the language, "number_of" may be nonsense. But they are tokens none the less, in this example.


See also

  • Token decoration

[1]Sebesta, R. W. (2006). Concepts of programming languages (7th ed.) pp.177. Boston: Pearson/Addison-Wesley.</ref>


  Results from FactBites:
 
General SQL Parser: sql engine for various databases (562 words)
After read all input sql, a source token list is generated which is the input of the yacc parser.
Yacc parser read source token in SourceTokenList,based on the BNF of different database dialects, the parser will create a raw parse tree if no syntax error was found.
During the translation, parser will iterate the whole raw parse tree, and OnTableToken event will be fired if a source token stands for a table is found, OnFieldToken event will be fired if a source token stands for a field is found.
Token (parser) - Wikipedia, the free encyclopedia (510 words)
In computing, a token is a categorized block of text, usually consisting of indivisible characters known as lexemes.
Tokens are frequently defined by regular expressions, which are understood by a lexical analyzer such as lex.
Tokenization is the process of demarcating and possibly classifying sections of a string of input characters.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.