This was orignally a follow up to my question about YAML documentation, but grew into a separate topic. This is not an announcement, but I think it's about time to get some feedback on the design. It's hardly specific to Ruby, but I guess it would work well in a Ruby context. "Mauricio FernáÏdez" <batsman.geo / yahoo.com> wrote in message news:20030223073420.GA13356 / student.ei.uni-stuttgart.de... > On Sun, Feb 23, 2003 at 08:06:16AM +0900, MikkelFJ wrote: > > BTW: what tool(s) did you use to produce the Yaml documentation? > > Yaml :) Take a look at doc/yamlrb.yod: it is a "Yaml document", to be > processed by Yod (Yaml Ok Documentation). See src/yod.rb: I kind of figured :-) But it must be post-processed to xsl-fo or something? I have on and off for a long time been hacking on a simple xml format - like some of the people behind yaml - then comes yaml. But meanwhile I changed focus towards a format that is supposed to be especially suited for documentation purposes. yaml is also text typing friendly - but still not the best possible for text entry. I hacked something in Ruby but need to back to it. The primary motivation is that Tex is too complex and xml-doc is too cumbersome - and finally the need to have a text format as you can't trust wordprocessors to be around in the long term, and are bad for formattting and source control. Ruby doc format is a similar approach, but not sufficiently advanced in formatting. Perhaps I should ask for some help here in getting the format completed? The design goals are - absolutely minimum of escape symbols - arbitrarily complex nesting - automatic tag-close based on context - support for meta-tagging (comments, other languages, notes) - headers etc. should not be escaped by = for level 1, == for level 2 etc., because it makes it difficult to move a section. I see it as a possiblity to use Wiki like interface for advanced text formatting purposes - and also for non-text purposed - but here YAML or even XML might be better. Currently I haven't looked much into how to represent lists, a case where YAML clearly excells. I've written a prelim. spec., but I'm considering changing it a bit. Here are the main points (it's simple because that's the whole point). I've currently got some problems handling paragraph breaks - I don't want to type them everywhere, but deducing them can be tricky. I called it STEP: Structured Text Entry Processor. Text is text. A blank line is is paragraph break (whatever that means in the given context). The only escape symbols are curly braces. This form a command. example: {chapter The first chapter} Here is text. Then next sentence is bolded. {b This is bolded text}. This is not bold. {chapter The next chapter} Here is text in chapter two. {section a subsection} Text in section. {note needs cleanup} {chapter Also a chapter} Clearly tags (called commands) follow '{'. These are not predefined in STEP. STEP provides means to define tags hierarchies which enables one tag to automatically close another. STEP also has two kinds of commands: those that has a header and a body, and those that only have a header: {b header only}, {chapter header} body {chapter header} body I am actually considering having two different symbols for the two command styles: {b header only}, [chapter header] body [chapter header] body But then I would have more symbols to escape. I am also considering moving the command name outside of '{': This is b{bolded text} this is not bolded. chapter{The chapter title} The chapter body section{Text in section} However, currently the name follows '{' as in: This is {b bolded text}. Semantics are the most important, but here is the basic syntax: <step> ::= (<text> | <field>)(<text> | <field>|<break>)* <field> ::= '{' <command> [<space>+ <step>] '}' <name> ::= (SYMBOL except <space>, '{', '}', '(', or ')')* <text> ::= (<name> | '\{' | '\}' | '(' | ')' )* <command> ::= <name> [ '(' <arguments> ')'] <arguments> ::= -- reserved for future <break>, <space> ::= -- see below The only escaped symbols are '{' and '}'. '\' is not escaped: If you want to write '{' you must write '\{', but if you want to write '\{' you write '\\{'. '\' only has a special meaning before '{' or '}'. Spaces are usually merged into a single <word-break> command. To have explicit spaces in front of text or just spaces, use the command with no name: The following are multiple spaces { }and the following are multiple { spaces follewed by text}. Not shown: There a special commands for handling source code text completely unescaped using something similar to <<EOInput, and another simpler option where { } are only required to be balanced. <arguments> are reserved for future used. They would a allow a syntax like {font(courier, 10) some text in courier}. The following is an attempt to clearly define the space syntax. The <word-break> and <paragraph-break> are significant. <space> is stripped and is only used to seperate the command name from the following text. There are problems - how to deal with space before and after a field if the field evaluates to nothing, and there are several issues with explicit paragaph-breaks and implicit breaks (like after a chapter title). Therefore, a higher lever syntax must also be used to handle document output and clean up repeated breaks. <break> ::= <word-break> | <paragraph-break> <space> ::= (<blank> | <newline>)+ <blank> ::= SPACE | TAB <newline> ::= (CR LF | LF | CR not followed by LF ) <word-break> ::= <blank>* [<newline> <blank>*] <paragraph-break> ::= [<blank>] <newline> ([<blank>] <newline>)+ UTF-8 symbols are handled directly by the syntax. In fact the format is perfectly suited for binary encodings as long as '{', '}' are escaped and space sequences are contained in { }. Something that I haven't covered here is how you can define commands as macros of other commands, and how you can define commands to be subordinate to other commands for automatic tag closing. While there is a syntax for doing so, this is something that can be defined outside of the scripting syntax such that commands like {chapter} and {section} are predefined. The processor will also accept undefined commands, but in that case they will be treated as having no body - that is they stop exactly where at '}'. Another issue not covered is that commands inside the header or body of other commands may be treated specially within that context. Thus a command can act as modifier to the active parent command: e.g. {chapter {1} Introduction}, here {1} acts as an enumeration command. This is partly why I haven't settled for arguments to commands. In fact the entire header text of a command could be viewed as arguments to certain commands. E.g. {font courier, 10} {font {name courier}{size 10}} STEP is only a syntax and a processor, so a separate layer on top of STEP would be needed for a particular purpose. One such layer could be a generic handler for generated XSL-FO, a subset of Latex, HTML and Doc-Book. {Early-brainstorming} As I mentioned, I am considering moving the command name outside the of the curly braces, but I havent investigated this further yet. I personally tend to think "bold" and then realize I need to add some delimiters, typically going back to add the curly brace. Compare this to LISP versus other languages function syntax: (print "foo") and print("foo") Also, I am considering having a special short command notation for commands covering a single word: Only the word b,,only is bolded. Only the word {b only} is bolded. Two commas happen infrequently in natural text but are quick to enter and easy to read. Two commas (or more) would be escaped by {,,}, analogous to escaping spaces. It could also be used with linebreaks when preceeded by colon: chapter:,,This is chapter 1 This is the content of chapter 1. However, I don't really like too many special cases just to make things marginally easier. It's much easier if commands are exactly { } and nothing else. I might by intoo the ,, notion because it's so much easier. As mentioned I do have some prototype code around - mail if interested. I also learned that Ruby really needs a lexer tool, it was not as easy to implement in Ruby as I had expected. Mikkel