TextScanner Update

Many years ago, I posted a TextScanner on flipcode as a code of the day. In the years since, I've continued to use it, and it's accumulated a few new features, and a couple of bug fixes. I know a lot of people use it, so here's an update. New features include Unicode support, some path parsing functions, a Python-esque split and join function (and an infinitely faster C array oriented one), and support for parsing ILM's OpenEXR types. The original sample code is at the bottom to demonstrate the spirit of usage - it hasn't been re-compiled against this latest version of the library. Note that the Unicode <> UTF8 conversion routines are specific to Windows.

Note also that the TextSplitter class is not production quality code. It uses magic numbers, modifies the string that was passed in, and doesn't bounds check - it'll simply kill memory. This perfectly suited my needs, and so I left it at that. Writing it properly is left as an exercise for the reader ;)

Download TextScanner.h and TextScanner.cpp

 

Here's the original blurb from flipcode:

 

This is a text scanner that's stood me in good stead for many years. I've written zillions of parsers with it, and a couple of compilers. It's really minimal, and stateless. As a minimal stateless thing, it is simply a C style interface wrapped in a namespace. You can certainly wrap it in a class if you wish, which is how this code started out, but I've found it much more flexible as a C style API. It operates on a block of text loaded into memory. There are no files or iostreams involved. stdint.h is lifted out of the C99 specification. You can substitute any typedefs you wish; i used it because it was convenient. There's obviously lots of things not included - like substring matching, and some odd things included, like scanning past C++ style comments. Basically, the functions that got used a million zillion times over the years have stayed, and all the other stuff is gone. The interface is really easy to use. You pass in a pointer to where you are now in the text, and a pointer to the end of the text. The routine you call does its thing, and passes back a pointer to the next edible character. You can use that pointer as the new current pointer, or you can forget it and reuse your old current pointer to do something else. I've attached a simple Wavefront OBJ file reader to demonstrate how the TextScanner works. Use the code any way you wish; you assume all responsibility for its correctness and suitability should you use it. - nick

 
WavefrontOBJParser.cpp:

 

// standard libraries
#include <vector>
using std::vector;
#include "TextScanner.h"// model reader package
#include "ModelReader.h" void Model::ReadWavefrontOBJ(char* pSource, int dataSize)
{ char const* pCurr = pSource; char const* pEnd = pSource + dataSize;	while (pCurr < pEnd) { pCurr = TextScanner::ScanForNonWhiteSpace(pCurr, pEnd); switch (*pCurr) { case 'u': // u is the first character of the usemtl command case 'g': // group case '#': // comment - skip over 'em default: // unknown tokens - skip over 'em pCurr = TextScanner::ScanForEndOfLine(pCurr, pEnd); pCurr = TextScanner::ScanForNonWhiteSpace(pCurr, pEnd); break; case 'v': // vertex { sdcVector temp(0, 0, 0); ++pCurr;	// skip past v switch (*pCurr) { case 'n':	// vertex normal ++pCurr;	// skip n pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.x); pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.y); pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.z); normals.push_back(temp); break; case 't':	// vertex texture ++pCurr;	// skip t pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.x); pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.y); pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.z); mapping.push_back(temp); break; default:	// vertex position pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.x); pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.y); pCurr = TextScanner::GetFloat(pCurr, pEnd, temp.z); vertices.push_back(temp); break; } } break; case 'f':	// face { ++pCurr;	// skip past f char const* eol = TextScanner::ScanForEndOfLine(pCurr, pEnd); int v[3]; int t[3]; int n[3]; int index = 0; while (pCurr < eol) { // get index pCurr = TextScanner::GetInt(pCurr, eol, v[index]); v[index] -= 1; ++pCurr; if (*pCurr == '/')	// no tex coord? { t[index] = -1; } else { pCurr = TextScanner::GetInt(pCurr, eol, t[index]); t[index] -= 1; } ++pCurr; if (*pCurr == '/')	// no normal? { n[index] = -1; } else { pCurr = TextScanner::GetInt(pCurr, eol, n[index]); n[index] -= 1; } ++index; if (index == 3) { faceIndices.push_back(v[0]); faceIndices.push_back(v[1]); faceIndices.push_back(v[2]); normalIndices.push_back(n[0]); normalIndices.push_back(n[1]); normalIndices.push_back(n[2]); uvIndices.push_back(t[0]); uvIndices.push_back(t[1]); uvIndices.push_back(t[2]); // if a polygon was encountered, turn it into triangles // by creating a fan v[1] = v[2]; n[1] = n[2]; t[1] = t[2]; index = 2; } } } break; case 'o':	// indicates an object name. skip it! case 'm':	// material library m is first character of mtllib { pCurr = TextScanner::ScanForEndOfLine(pCurr, pEnd);	// skip it for the moment } break; // bmf extension t case 't':	// beginning a triangle strip break; // bmf extension q case 'q':	// continuing a triangle strip break; } }
} 
programming

Content by Nick Porcino (c) 1990-2011