[GTALUG] Online Course for Lex/Yacc?

D. Hugh Redelmeier hugh at mimosa.com
Sun Sep 16 01:40:23 EDT 2018


| From: William Park via talk <talk at gtalug.org>

| Every now and then, I come across situation where an ideal solution
| would have been a custom parser/interpreter, either to parse data format
| or to write my own sets of commands.  In each case, I don't know enough
| to write it (I'm not Comp Sci) or don't have time.  So, I end up
| resorting to shell, awk, python, etc.
| 
| Now, I want to take some course on Lex/Yacc, finally!
| What online course do you recommend?

I've never used online courses.  Certainly not about this stuff.  So I
cannot answer your question directly.

What is it that you hope to learn?

- theory?

- practical use of these specific tools?

- for what problems are these tools a good choice?

What languages are you comfortable using to program?  I ask because
many languages have provided tools or libraries that might fully or 
partially address your needs.

TL;DR: from here on is a long discussion of my experiences with tools like 
these.

I don't have the habit of using lex.  I write my own lexers in C.  Not
recommended: C is an unforgiving and unhelpful language.  But I've
done that for 40+ years.  (And mostly in assembly languages for the
decade before that.)  I can do it in my sleep.

I think lexing is pretty easy.  But maybe that's because I've already
thought about most of the choices that need to be made.  Perhaps lex
provides a less experienced programmers a framework that guides some
choices.

I don't have the habit of using yacc.  yacc is a parser generator for
LALR grammars.  The construction of LALR parsers was first described
in DeRemer's PhD thesis.  But the first implementation was by Wilf
LaLonde at U of T.  For some reason that I don't remember, LaLonde
moved for a bit to Waterloo and I (an undergrad) found his generator
on the Waterloo computer (but no documentation).  I played with it,
reverse engineered how the generated tables worked, and started using
it.  So I was one of the first to use such a gizmo.

It was intoxicating.  I invented more and more complicated language
constructs because the complexity cost me almost nothing.  My
particular project was to design a high-level assembler for the
PDP-11: think PL/360 (Niclaus Wirth's high-level assembler for the
IBM System/360).

I now think that this is a mistake.  Parsing is not the hardest
problem in writing compilers.  A hand-rolled recursive descent
parser is fairly easy and allows for clearer diagnostics (diagnostics
are important!).  But I did learn a lot.

Complex language syntax is a burden on the user. Languages that didn't 
learn that lesson include C++, Algol 68, PL/I.

In the Libreswan project, Pluto includes a simple hand-rolled lexer
and parser for a kind of config file.  I wrote that.  After I left the
project, someone added a lex + yacc parser for a different kind of
config file.  When I rejoined the project, the lex + yacc code didn't
work right and nobody knew how to fix it.  I've kicked the worst
problems out of it.  But I think I could actually reduce the lines of
code and bugs by just rewriting them to use the lexer that is already
in pluto and with minimal recursive descent parser.

LALR parsing is a very interesting discipline.  I'm glad I understand it 
(although it is a bit foggy after all these years).  I'm not sure that it 
is useful for most programmers.

Lexing and parsing are things programmers do all the time, informally.
Even if they don't know that's what they are doing.  Understanding the
problems formally can only be a help.

I learned this stuff from books, people, and hacking.  The first book to 
really present this stuff with a practical parser generator at the core 
was
	McKeeman, Horning, and Wortman: A Compiler Generator.
(I took courses from the latter two and met McKeeman.)
LaLonde's LALR parser generator slotted into that framework and was
much better than the parser generator that they used.
(Kildall's PL/M was clearly created with and inspired by this work.
He then used it to create CP/M.  Which was copied to create MSDOS.)

The field was relatively young and approachable then.  Compiler stuff has 
gotten way more developed and harder to approach.  As a result amateurs 
jump in without the full background and make long-understood mistakes.  I 
don't see a good alternative: there is too much to learn before doing 
useful work.

The hardest part of compiler writing is code generation and 
optimization (the "backend").  For many "languages" (like libreswan 
control files) there is no need for code generation.  Recent language 
implementations often use the LLVM backend.

There is a concept of DSL (Domain Specific Languages): small, special 
purpose languages.  Perhaps that's what you are trying to create.  
Googling might turn up light-weight tools to help with that.


More information about the talk mailing list