Ocean
|
This class implements a simple scanner. More...
Data Structures | |
class | Token |
This class implements a token for the scanner. More... | |
Public Member Functions | |
Scanner (const std::shared_ptr< std::istream > &stream, float *progress=nullptr, bool *cancel=nullptr) | |
Creates a new scanner using a stream as input. More... | |
Scanner (const std::string &filename, const std::string &buffer, float *progress=nullptr, bool *cancel=nullptr) | |
Creates a new scanner using a file or a memory buffer as input. More... | |
Scanner (const std::string &filename, std::string &&buffer, float *progress=nullptr, bool *cancel=nullptr) | |
Creates a new scanner using a file or a memory buffer as input. More... | |
virtual | ~Scanner () |
Destructs a scanner. More... | |
const Token & | token () |
Returns the recent token. More... | |
const Token & | lineToken () |
Returns a line token starting at the current position. More... | |
Token | tokenPop () |
Return the recent token and pops it afterwards. More... | |
const Token & | look () |
Returns a lookout to the next token. More... | |
void | pop () |
Pops the recent token. More... | |
size_t | line () const |
Returns the recent line. More... | |
size_t | column () const |
Returns the recent column. More... | |
size_t | position () const |
Returns the position of the scanner. More... | |
size_t | size () const |
Returns the size of the scanner. More... | |
const std::string & | filename () const |
Returns the name of the input file, if the input is a file. More... | |
bool | isValid () const |
Returns whether the scanner is valid and ready to use. More... | |
Static Public Member Functions | |
static bool | findNextToken (const char *pointer, const size_t size, const size_t start, size_t &tokenStart, size_t &tokenLength) |
Finds the next token in a given string starting from a specified position. More... | |
static bool | findNextToken (const char *pointer, const size_t start, size_t &tokenStart, size_t &tokenLength) |
Finds the next token in a given string starting from a specified position. More... | |
static bool | isWhitespace (const char &character) |
Returns whether a given character is a white space character. More... | |
Static Public Attributes | |
static constexpr uint32_t | invalidId = uint32_t(-1) |
Definition of an invalid keyword or symbol id. More... | |
Protected Types | |
enum | FirstChar : uint16_t { CHAR_INVALID = 0 , CHAR_CHARACTER = 1 , CHAR_IDENTIFIER = 2 , CHAR_NUMBER = 4 , CHAR_INTEGER = 8 , CHAR_KEYWORD = 16 , CHAR_STRING = 32 , CHAR_SYMBOL = 64 , CHAR_REMARK = 128 , CHAR_SPACE = 256 } |
Definition of first character types. More... | |
typedef std::unordered_map< std::string, uint32_t > | IdMap |
Definition of an unordered map mapping strings to ids. More... | |
typedef std::unordered_set< std::string > | LineRemarks |
Definition of an unordered set holding line remark symbols. More... | |
typedef std::unordered_map< std::string, std::string > | ScopeRemarks |
Definition of an unordered map mapping begin remark symbols to end remark symbols. More... | |
typedef std::array< uint16_t, 256 > | CharTable |
Definition of a character table. More... | |
Protected Member Functions | |
Scanner (float *progress, bool *cancel) | |
Creates a new scanner. More... | |
uint8_t | get (const size_t offset=0) |
Returns one character. More... | |
std::string | data (const size_t size) const |
Returns data of a specified size starting at the recent position. More... | |
std::string | data (const size_t offset, const size_t size) const |
Returns data of a specified size starting at the offset position. More... | |
void | consume (const size_t chars=1) |
Consumes one or more character. More... | |
bool | refillIntermediateBuffer () |
Refills the intermediate buffer. More... | |
uint32_t | keywordId (const std::string &data) const |
Returns the keyword id of a given string. More... | |
uint32_t | symbolId (const std::string &data) const |
Returns the symbol id of a given string. More... | |
void | setKeywordProperty (const bool caseSensitive) |
Sets whether the keywords are case sensitive or not. More... | |
void | registerKeyword (const std::string &keyword, const uint32_t id) |
Registers a new keyword. More... | |
void | registerSymbol (const std::string &symbol, const uint32_t id) |
Registers a new symbol. More... | |
void | registerLineRemark (const std::string &lineRemark) |
Registers a line remark symbol. More... | |
void | registerScopeRemark (const std::string &begin, const std::string &end) |
Registers a scope remark symbol. More... | |
bool | registerWhiteSpaceCharacter (const uint8_t character) |
Registers a white space character. More... | |
virtual Token | readToken (const bool consumeBytes=true) |
Reads and returns the next token. More... | |
uint8_t | readWhiteSpace (bool crossLines=true) |
Reads white space. More... | |
std::string | discardNonWhiteSpace () |
Discards non white space and jumps to the first white space position. More... | |
bool | readRemark () |
Reads remark comments. More... | |
bool | readLineRemark () |
Reads a line remark comment. More... | |
bool | readScopeRemark () |
Reads a scope remark comment. More... | |
bool | readCharacter (Token &token, const bool consumeBytes) |
Tries to read a character as next token. More... | |
bool | readIdentifier (Token &token, const bool consumeBytes) |
Tries to read a identifier as next token. More... | |
bool | readInteger (Token &token, const bool consumeBytes) |
Tries to read an integer as next token. More... | |
bool | readKeyword (Token &token, const bool consumeBytes) |
Tries to read a keyword as next token. More... | |
bool | readLine (Token &token, const bool consumeBytes) |
Tries to read a remaining line as next token. More... | |
bool | readNumber (Token &token, const bool consumeBytes) |
Tries to read a number as next token. More... | |
bool | readString (Token &token, const bool consumeBytes) |
Tries to read a string as next token. More... | |
bool | readSymbol (Token &token, const bool consumeBytes) |
Tries to read a symbol as next token. More... | |
Protected Attributes | |
Token | recentToken_ |
Recent token. More... | |
Token | nextToken_ |
Next token. More... | |
std::shared_ptr< std::istream > | stream_ |
The input stream from which the scanner receives the data. More... | |
std::string | filename_ |
The name of the input file, if the input is a file. More... | |
float * | progress_ = nullptr |
The scanner's progress in percent, with range [0, 1]. More... | |
bool * | cancel_ = nullptr |
Cancel flag. More... | |
Memory | intermediateBuffer_ |
Local intermediate buffer. More... | |
uint8_t * | intermediateBufferPointer_ = nullptr |
The current pointer inside the intermediate buffer. More... | |
size_t | intermediateBufferSize_ = 0 |
Number of remaining characters in the intermediate buffer. More... | |
Memory | extraBuffer_ |
Local extra buffer, used if the intermediate buffer is too small. More... | |
uint8_t * | extraBufferPointer_ = nullptr |
Pointer inside the extra buffer. More... | |
size_t | extraBufferSize_ = 0 |
Number of remaining characters inside the extra buffer. More... | |
size_t | line_ = 1 |
Holds the current line. More... | |
size_t | column_ = 1 |
Holds the current column. More... | |
size_t | position_ = 0 |
Holds the current position of the scanner. More... | |
IdMap | keywordMap_ |
Map mapping keyword strings to identifier ids. More... | |
bool | keywordsAreCaseSensitive_ = true |
Determines whether all keywords are case sensitive. More... | |
IdMap | symbolMap_ |
Map mapping symbol strings to symbol ids. More... | |
LineRemarks | lineRemarks_ |
Registered line remarks. More... | |
size_t | maximalLengthLineRemarks_ = 0 |
Length of the maximal line remark. More... | |
ScopeRemarks | scopeRemarks_ |
Scope remarks. More... | |
size_t | maximalLengthScopeRemarks_ = 0 |
Length of the maximal scope remarks. More... | |
CharTable | firstCharTable_ |
Table holding the definition of allowed first characters. More... | |
CharTable | followingCharTable_ |
Table holding the definition of allowed following characters. More... | |
CharTable | invalidCharTable_ |
Table holding the definition of not allowed following characters. More... | |
Static Protected Attributes | |
static constexpr size_t | minBufferSize_ = 2048 |
Definition of the minimum intermediate buffer size. More... | |
static constexpr size_t | maxBufferSize_ = 8192 |
Definition of the maximum intermediate buffer size. More... | |
Private Member Functions | |
uint8_t | getExtra (const size_t offset=0) |
Returns one character from the extra buffer. More... | |
bool | refillExtraBuffer (const size_t minIndex) |
Refills the extra buffer. More... | |
Static Private Member Functions | |
static std::shared_ptr< std::istream > | createInputStream (const std::string &filename, std::string &&buffer) |
Creates a file input stream or a string input stream depending on the given input. More... | |
static std::shared_ptr< std::istream > | createInputStream (const std::string &filename, const std::string &buffer) |
Creates a file input stream or a string input stream depending on the given input. More... | |
This class implements a simple scanner.
|
protected |
Definition of a character table.
|
protected |
Definition of an unordered map mapping strings to ids.
|
protected |
Definition of an unordered set holding line remark symbols.
|
protected |
Definition of an unordered map mapping begin remark symbols to end remark symbols.
|
protected |
|
explicit |
Creates a new scanner using a stream as input.
stream | The stream to be use as input |
progress | Optional resulting scanner progress in percent, with range [0, 1] |
cancel | Optional scanner cancel flag |
|
inline |
Creates a new scanner using a file or a memory buffer as input.
filename | The name of the file to be used as input, buffer must be empty |
buffer | The buffer to be used as input, filename must be empty |
progress | Optional resulting scanner progress in percent, with range [0, 1] |
cancel | Optional scanner cancel flag |
|
inline |
Creates a new scanner using a file or a memory buffer as input.
filename | The name of the file to be used as input, buffer must be empty |
buffer | The buffer to be used as input, filename must be empty |
progress | Optional resulting scanner progress in percent, with range [0, 1] |
cancel | Optional scanner cancel flag |
|
virtual |
Destructs a scanner.
|
protected |
Creates a new scanner.
The scanner may forward an entire progress state, if the pointer value is defined.
Beware: Make sure that the value exists during the whole scanning timer!
Further, the scanner may be canceled by an explicit flag.
In the case the scanner is canceled an end of file token is returned.
Beware: As for the progress value, also the cancel object must exist during the whole scanning progress, if provided
progress | Optional progress parameter to forward the scanning progress with range [0, 1], use nullptr if the progress state is not necessary |
cancel | Optional cancel state to cancel the scanner progress by setting the flag to 'true', use nullptr if the cancel state is not necessary |
|
inline |
Returns the recent column.
|
protected |
Consumes one or more character.
chars | Number of characters to consume |
|
inlinestaticprivate |
Creates a file input stream or a string input stream depending on the given input.
filename | The name of the file to be used as input, buffer must be empty |
buffer | The buffer to be used as input, filename must be empty |
|
inlinestaticprivate |
Creates a file input stream or a string input stream depending on the given input.
filename | The name of the file to be used as input, buffer must be empty |
buffer | The buffer to be used as input, filename must be empty |
|
protected |
Returns data of a specified size starting at the offset position.
Beware: Make sure that enough pending buffer is available
offset | Start position relative to the current position |
size | Size of the data to receive |
|
protected |
Returns data of a specified size starting at the recent position.
Beware: Make sure that enough pending buffer is available
size | Size of the data to receive |
|
protected |
Discards non white space and jumps to the first white space position.
|
inline |
Returns the name of the input file, if the input is a file.
|
static |
Finds the next token in a given string starting from a specified position.
A token is enclosed by white characters or by the borders of the given string, the length of the given string is explicitly defined by the parameter 'size'.
pointer | The pointer to the string in which the next token is to be found, must be valid |
size | The length of the given string in characters, with range [1, infinity) |
start | The first character within the given string that defines the first possible character of the token, with range [0, size - 1] |
tokenStart | The resulting start location within the given string of the found token, with range [start, strlen(pointer) - 1] |
tokenLength | The resulting length of the found token, with range [1, strlen(pointer) - start] |
|
static |
Finds the next token in a given string starting from a specified position.
A token is enclosed by white characters or by the borders of the given string, the end is identified by a null character.
pointer | The pointer to the string in which the next token is to be found, can be nullptr |
start | The first character within the given string that defines the first possible character of the token, with range [0, strlen(pointer)] |
tokenStart | The resulting start location within the given string of the found token, with range [start, strlen(pointer) - 1] |
tokenLength | The resulting length of the found token, with range [1, strlen(pointer) - start] |
|
protected |
Returns one character.
offset | Offset to the recent position |
|
private |
Returns one character from the extra buffer.
offset | Offset inside the recent extra buffer |
|
inline |
Returns whether the scanner is valid and ready to use.
|
inlinestatic |
Returns whether a given character is a white space character.
A white space character can be one of the following:
' ', '\t', '\n', or '\r'
character | The character to be checked |
|
protected |
Returns the keyword id of a given string.
data | Data to convert to a keyword |
|
inline |
Returns the recent line.
const Token& Ocean::IO::Scanner::lineToken | ( | ) |
Returns a line token starting at the current position.
A line token does not handle remarks.
const Token& Ocean::IO::Scanner::look | ( | ) |
Returns a lookout to the next token.
void Ocean::IO::Scanner::pop | ( | ) |
Pops the recent token.
size_t Ocean::IO::Scanner::position | ( | ) | const |
Returns the position of the scanner.
|
protected |
Tries to read a character as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protected |
Tries to read a identifier as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protected |
Tries to read an integer as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protected |
Tries to read a keyword as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protected |
Tries to read a remaining line as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protected |
Reads a line remark comment.
|
protected |
Tries to read a number as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protected |
Reads remark comments.
|
protected |
Reads a scope remark comment.
|
protected |
Tries to read a string as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protected |
Tries to read a symbol as next token.
token | Returning token |
consumeBytes | Determines whether the scanner consumes the read characters |
|
protectedvirtual |
Reads and returns the next token.
consumeBytes | Determines whether the scanner consume the read characters. |
|
protected |
Reads white space.
crossLines | Determines whether the white space can be separated over several lines |
|
private |
Refills the extra buffer.
minIndex | Minimal index of the character needed inside the extra buffer |
|
protected |
Refills the intermediate buffer.
|
protected |
Registers a new keyword.
keyword | New keyword |
id | Id of the keyword |
|
protected |
Registers a line remark symbol.
lineRemark | Line remark symbol |
|
protected |
Registers a scope remark symbol.
begin | Begin remark symbol |
end | End remark symbol |
|
protected |
Registers a new symbol.
symbol | New symbol |
id | Id of the symbol |
|
protected |
Registers a white space character.
character | White space character to register |
|
protected |
Sets whether the keywords are case sensitive or not.
As default all keywords are case sensitive.
Beware: This property has to be set before the first keyword is registered!
caseSensitive | True, if all keywords will be case sensitive |
size_t Ocean::IO::Scanner::size | ( | ) | const |
Returns the size of the scanner.
|
protected |
Returns the symbol id of a given string.
data | Data to convert to a symbol |
const Token& Ocean::IO::Scanner::token | ( | ) |
Returns the recent token.
Token Ocean::IO::Scanner::tokenPop | ( | ) |
Return the recent token and pops it afterwards.
|
protected |
Cancel flag.
|
protected |
Holds the current column.
|
protected |
Local extra buffer, used if the intermediate buffer is too small.
|
protected |
Pointer inside the extra buffer.
|
protected |
Number of remaining characters inside the extra buffer.
|
protected |
The name of the input file, if the input is a file.
|
protected |
Table holding the definition of allowed first characters.
|
protected |
Table holding the definition of allowed following characters.
|
protected |
Local intermediate buffer.
|
protected |
The current pointer inside the intermediate buffer.
|
protected |
Number of remaining characters in the intermediate buffer.
|
protected |
Table holding the definition of not allowed following characters.
|
staticconstexpr |
Definition of an invalid keyword or symbol id.
|
protected |
Map mapping keyword strings to identifier ids.
|
protected |
Determines whether all keywords are case sensitive.
|
protected |
Holds the current line.
|
protected |
Registered line remarks.
|
staticconstexprprotected |
Definition of the maximum intermediate buffer size.
|
protected |
Length of the maximal line remark.
|
protected |
Length of the maximal scope remarks.
|
staticconstexprprotected |
Definition of the minimum intermediate buffer size.
|
protected |
Next token.
|
protected |
Holds the current position of the scanner.
|
protected |
The scanner's progress in percent, with range [0, 1].
|
protected |
Recent token.
|
protected |
Scope remarks.
|
protected |
The input stream from which the scanner receives the data.
|
protected |
Map mapping symbol strings to symbol ids.