20050104 SiSU is released
--------------------------

Announce
--------

Excuse the lengthy announcement, hubris and repetition.

A fairly big day for me, I have worked on SiSU for several years, though
only recently with its imminent release in mind...

The focus of SiSU is simple and sparse markup requirements, (used single
documents or large documents sets), to produce structured multiformat
published text versions, with a common/shared citation system, and
search possibilities that take advantage of this.

Little time has been spent on the installation procedure. I would
appreciate feedback from anyone who installs and tests SiSU on Linux and
BSD (and OSX?) platforms. I anticipate there will be problems initially
related to installation and setup, which I would be grateful for
feedback on and, which I will be pleased to help with.

Once past the install I would very much appreciate feedback generally
and especially from Rubyists (as the text it is designed to work with is
not a code or documentation, interest will not be developer specific,
and may be limited), Librarians, Document Projects, and academic writers
on aspects of interest.

Additional syntax highlighters for SiSU markup would be extremely
welcome, they don't need to be as complete as the vim highlighter. Emacs
would obviously be nice, of much interest would be the ruby editors, and
also less geeky text editors, as it is hoped that SiSU will eventually
be used by non-coders.

I expect some criticism for hubris, some OT opinions expressed here (and
elsewhere), and possibly coding style which has evolved over the years,
and which may not always have been consistently updated (also because of
the lack of use of spaces, put that down to using an editor with
excellent syntax highlighting and what I have come to be accustomed to,
as a lone coder).

This release will primarily be of interest to developers as the
install/setup are hardly documented, (and assumes you have independently
installed external programs that are taken advantage of such as
Postgresql, have file permissions set and more), it is not tested across
platforms. But if you are able to get it working it does do quite a bit.
Paradoxically, though for documents it is not for programming
documentation, and this will reduce its value to the same developers who
might currently be able to use it.

I ask much, there is no rush. (This is sadly be a fairly busy month for
me, my response time is going to have to be slow.)

I have enjoyed working on SiSU very much over a number of years, and am
pleased with what it does and how it does them. I hope it is of use to
others.

Ready or not, here it is, as it (currently) is, enjoy,

Download:
---------
sisu_0.1.0-9_2005w01-2.tgz

http://www.jus.uio.no/sisu/download/sisu_0.1.0-9_2005w01-2.tgz

SHA1(sisu_0.1.0-9_2005w01-2.tgz)=
14b230ba5a4c8f1c7264b38cd2d9c95a97477f3a

Well Wishes all for 2005,
Ralph Amissah

What is SiSU?
-------------

(SiSU - simple, information structuring utility/universe)

SiSU is an electronic publishing system and (hybrid) kind of document
management system (for the documents that it generates), with its own
unique set of features, including amongst many others, very simple
markup; writing to the file system (for Internet, Intranet, or file
serving, and including eg CD publication) and/or relational database; in
multiple output formats (html, structured XML, LaTeX and pdf,
postgresql), with a citation system that is common to all output types.

SiSU is a (command line) text processing program that produces
structured electronic documents from a simple marked up input file
(using a markup syntax similar to smart ascii that I claim to be simpler
than the most elementary html) in multiple output formats, from html,
and structured XML, to pdf via LaTeX, and to streaming into relational
databases (currently Postgresql), writing in a structured way to the
file system or to a relational database, where it retains information on
the documents structure.

SiSU may be used either for individual documents or
collections/libraries of published (as in finished and not subject to
continuous change) documents. The type of documents it handles being
primarily law (which can be quite diverse) and literature, some social
sciences, (as opposed to maths, science, programming etc.) There are
several samples available.

Documents are marked up in "SiSU Syntax" in your favourite browser, and
SiSU a command line driven batch processor is run against the marked up
document(s) to produce the desired output(s).

SiSU (once installed and set up) should be easy enough for anyone to
use, (with a bit of additional documentation). The markup syntax is
simple, and the commands are easy enough with interactive help. It would
benefit greatly from additional syntax highlighters. (There are sample
input documents from which various outputs can be generated).

As a proof of concept the SiSU framework is in place, and many of the
modules have been used professionally for several years. There are many
more modules than the ones so far released, these have been held back
either because they have not been properly maintained, having fallen
into disuse, or because they are not generic enough in their current
implementation.

Information on SiSU is available at:
  http://www.jus.uio.no/sisu/SiSU/

Sample texts, and remember SiSU is not specifically for books:
  http://corundum/sisu/sisu/2#h2.1.3

Possibly of greater interest to illustrate how different the
possibilities this provides, is search:
  http://corundum/sisu/sisu/1#h1.14.6

And the markup from which this is derived:
  http://corundum/sisu/sisu/2#h2.2
  

SiSU provides
-------------

[This is part of a fairly recent attempt to explain certain aspects of
the project to a layman.]

SiSU provides a way with minimal markup effort to have multiple output
formats, taking advantage of some of the their strengths - vis. html,
structured XML, pdf via LaTeX, and relational SQL databases, all of
which are tied together using a common citation system.

* simple markup (done once, makes automatically available the rest),[1] 

* possibility of adding semantic data to documents (currently the Dublin
Core, though it would be easy to incorporate other, or alternative
systems)

* multiple outputs - using industry standards, and taking advantage of
the rather different strong points of each (html, structured XML, pdf
via LaTeX, relational SQL database - currently Postgresql, retaining
structural information)

* a common citation system for all document outputs, including the
relational database, searches being able to take taking advantage of the
implications of the citation system (primarily the automatic consistent
numbering of headings and paragraphs, in such a way that they can be
used by and to reference content in all output types).

There is a list of features of SiSU listed here:
  http://www.jus.uio.no/sisu/SiSU/1#h1.2
which I will tag on to the end of this document.

The document contains sample input and output files (several places, but
also here):
  http://www.jus.uio.no/sisu/SiSU/2#h2.2

The last thing to be done was a search front-end for the database, which
I finally decided to buckle down to doing. The back-end has been in
place for a number of years now, but this makes this feature a lot
easier to demonstrate. Unfortunately I do not have that online - a link
to images in its current form:
  http://corundum/sisu/sisu/1#h1.14.6

which relates to what IBM for example found to be of particular interest
early in the summer of 2004:
  http://www.jus.uio.no/sisu/SiSU/2004#795
  [location may change as this document is updated]

Some of those links will change with subsequent modifications to the
text, it is best used for published works.

There is much to browse generally, some of it is just fan material of
other things technical that I have found useful.

The document
  http://www.jus.uio.no/sisu/SiSU/
  http://www.jus.uio.no/sisu/SiSU/portrait
  http://www.jus.uio.no/sisu/SiSU/landscape

[1] e.g. marking up War and Peace (from a Gutenberg Project ascii text)
is done in a little over an hour.  Reduction in the effort required for
the preparation of texts (XML for example buzzword of the industry is
labour intensive and complicated, LaTeX is also a lot more complicated
than SiSU markup syntax - they are more flexible, but do not provide the
composite solution... single command building of documents and/or
populating of a relational dataabse, while retaining structural
information.

Platform
--------

Unix/Linux. 
[I have not glanced at other OS's for the purpose of
development since 1999.]

Developed and tested on Debian/Gnu/Linux Sid.

Short summary of features
--------------------------

from http://www.jus.uio.no/sisu/SiSU/1#h1.2

(i) minimal markup requirement, (ii) single file marked up for multiple
outputs, (iii) markup is simpler than html, (iv) the simple syntax is
mnemonic, influenced by mail/messaging/wiki markup practices *(v)* human
readable, and easily writable syntax, (vi) multiple outputs include
amongst others: "html"; "pdf" via "LaTeX"; (structured) "XML"; sql -
currently "PostgreSQL" (and sqlite); "ascii", (also "texinfo"), (vii)
takes advantage of the strengths implicit in these very different output
types, (e.g. LaTeX (professional document typesetting, easy conversion
to pdf or Postscript); XML (in this case, structural representation);
sql relational database (e.g. document search; representing constituent
parts of documents based on their structure, headings, chapters,
paragraphs as required; control of use) important enough to be given a
heading of its own.), (viii) provides a common citation system for all
outputs, (object citation numbering), all text objects (headings and
paragraphs) are numbered identically, for citation purposes, in all
outputs ("html", "pdf", sql etc.), (ix) use of Dublin Core and other
meta-tags to permit the addition of some semantic information on
documents, and making easy integration of rdf/rss feeds etc., (x)
creates organised directory/file structure for (file-system) output,
(xi) easily mapped with its clearly defined structure, with all text
objects numbered, you know in advance where in each document output
type, a bit of text will be found (eg.  from an sql search, you know
where to go to find the prepared "html" output or "pdf" etc.)... there
is more, (xii) search of document sets, the relational database retains
information on the document structure, and citation numbering makes it
possible for example to present search matches as an index of documents
and locations within the document where the match is found, (an image
series added December 12th 2004 in the Chronology pages, somewhere
around http://www.jus.uio.no/sisu/SiSU/2004#781 gives an idea of what is
possible, I unfortunately do not have the hardware currently set up to
demonstrate this dynamically on the www), (xiii) "word maps" rudimentary
index, consisting of all the words in a document and their (text object)
locations within the text, (xiv) very easily skinnable, document
appearance on a project/site wide, directory wide, or document instance
level easily controlled/changed, (xv) easy directory management and
document associations, the document preparation (sub-)directory may be
used to determine output (sub-)directory, the skin used, and the sql
database used, (xvi) in many cases a regular expression may be used
(once in the document header) to define all or part of a documents
structure obviating or reducing the need to provide structural markup
within the document, (xvii) is a batch processor for handling large
document sets, ... though once generated they need not be re-generated,
unless changes are made to the desired presentation of a particular
output type, (xviii) possible to pre-process, which permits the easy
creation of standard form documents, and templates/term-sheets, (xix)
easy to add, modify, or have alternative syntax rules for input, should
you need to, (xx) (future-proofing) extremely modular, (thanks in no
small part to Ruby) another output format required, write another
module.... , (xxi) (future-proofing) easy to update output formats (eg
html, xhtml, latex/pdf produced can be updated in program and run
against whole document set), (xxii) scalability, dependent on your
file-system (in my case Reiserfs) and on the relational database chosen
(currently Postgresql), and your hardware, (xxiii) a framework for
adding further capability as required, (xxiv) tied to version control
system, only code and marked up file need be backed up, to be sure of
the much larger document set, (xxv) document management, (xxvi) use your
favourite editor, syntax highlighting files for markup, primarily (g)vim
so far.

SiSU was developed in relation to legal documents, and so is strong
across a wide variety of texts (law, literature...), though weak on
formulae/statistics, it does handle images. An assumption has been
document sets that are to be preserved and maintained over time (also a
result of the legal text origin).  SiSU has been developed and used over
a number of years, and the requirements to cover a wide range of
documents have been thoroughly explored.

Standards
---------

Outputs are to standard protocols or open source software.

I would like to keep SiSU markup and meta-markup a standard, although by
the SiSU program design it is easy to modify.

I make claim to "object citation numbering" as a very simple idea with
which I have persisted for many years, that makes much possible, and is
a unifying feature of SiSU output.

Generated by SiSU
SiSU Sabaki 0.1.0-8 2004w51/4
www.jus.uio.no/sisu/SiSU/
Using:
Standard SiSU markup syntax,
Standard SiSU meta-markup syntax, and the
Standard SiSU object citation numbering and system
? Ralph Amissah 1997, current 2005.
All Rights Reserved.

Separating the markup syntax (human readable, and usually human
prepared), and meta-markup syntax (machine written) has interesting
possibilities.

  (i) It is possible to change the markup syntax (or have several
  alternative input sytaxes) without disturbing the downstream program
  modules/libraries, provided you write to the same standard meta-markup
  syntax. (if you used the original syntax and then changed to an
  alternative syntax, you would presumably have alternative standard
  meta-markup generators, or convert the original syntax to the
  alternative syntax). 

  (ii) It is also possible to change the meta-markup syntax, with
  consequences for all the downstream programs, but without in any way
  affecting your document set (your marked up documents).

Both of which have been very useful over the years of development, and
use of SiSU.

The object citation numbering system (ocn) is a simple idea, which being
relevant to man and machine has far reaching possibilities. All output
uses the same object citation numbering, including database searches,
which can present matches with an index of documents and the
(hyperlinked ocn) locations within each document where the match was
found.

However, it is of interest to keep both relatively stable, and indeed to
have a Standard. I claim this standard (at least the original standard).

License
-------

(i) GPL 2 or later, for non-commercial use of the program and
publications 

(ii) Distributed under a commercial license everything else, (terms to
be determined) that is for everything that is not (i)

expanded upon a bit -

GPL 2 or later.

Or under special license terms from Ralph Amissah. The details of which
are to be determined. The idea being that it can be incorporated into
proprietary systems, under a proprietary license, for a per seat fee.
(SiSU was identified as being of interest as a middle-ware application
by a large database and document management software provider...)

From this point on there will be a GPL and proprietary branch. I expect
if there is any take-up the GPL branch will advance faster and further
  (in my hands and generally) than the proprietary branch.

SiSU is the result of several years of research and development in
electronic publishing, commenced in 1993 and under active development
since 1997. There is always more to be done. SiSU is released under GPL
2 or later http://www.gnu.org/copyleft/gpl.html (first on January 4th
2005) and is alternatively available under special license terms from
Ralph Amissah the detail of which is to be determined. 

Setup/Installation/Use
---------------

To start with see the README file provided with the program.

Historical note
---------------

SiSU is the result of a several year journey of research and development
related to electronic publishing, in particular related to legal and
academic writings. It started with the discovery of the Web and a
project to publish legal documents on the Web in 1993. Programming
started later, but ideas as to what would be useful to have and be able
to do, started formed from that initiation. I was lucky enough at the
time to work with Geoffrey Armstrong and Tommy Johanson, (who wrote the
first lines of Perl I ever saw).

Programming SiSU, setting ends and attaining the ends set has been a
solo effort, from which I have learnt masses, and come to appreciate and
depend on the work of others, no one less so than Matz of Ruby fame.
Within the Ruby community I have learnt lots from others, in particular
Ruby book authors both paper and electronic (I would guess Dave Thomas,
Why (what's new in Ruby 1.8.0, and yes even bits of the Poignant Guide),
and Hal Fulton in roughly that order, Slagell's book is decent, I would
not have minded starting on Ruby with that), and those most vocal in the
newsgroup and irc channel (to many to keep track of let alone mention, -
Eek and Batsman and earlier in time DBlack deserve special mention). I
have not used, the recommended route of studying the code of other
projects (perhaps one day). The Ruby language is remarkable as has been
the Ruby community to date.

I have not studied other document/text processors as such either. My
impression is that this must be much easier to use than say a DocBook,
but will offer a different range of features. (I probably should not
mention it at all, I don't know).

I have always planned to share this work (under a dual license, one of
them being the GPL). A brief encounter with IBM in 2004 (Software
Innovations evaluation) had me scrambling to the U.S. June/July to
arrange a provisional Patent application, (and wondering if that was the
route I wished to pursue why I had not done so seven or more years
earlier) as the only way to meaningfully talk to them. The employee
left, and interest has not persisted, fortunately.  As to where I stand
on Software Patents, software patents in their current form appear to be
primarily a tool to stifle innovation, not to promote it, (indeed this
is why what I have done is a lot more interesting to a large company if
I hold a Patent than otherwise) that can only be financially afforded by
large companies in their application, and in their enforcement through
litigation.  Europe would do well not to have them.

If I were not pleased with Debian/Gnu/Linux(Sid), its' packaging system,
(developers and range of applications) and social contract, I would
almost certainly use one of the BSDs as my development platform -
FreeBSD or Dragonfly.

What SiSU is not - SiSU is not
------------------------------

* blogging software. (though i sometimes misuse it in this way)

* a wiki (well obviously, though it would be interesting to use this
technology alongside a wiki - the wiki being used for constantly updated
pages and navigation information, whilst SiSU is used for published
works that are not changed frequently - eg a published academic writing,
a book, a convention)

* for documentation on programming, or mathematical, scientific texts.

Todo
----

This is a fairly large project, much remains to be done. Of particular
interest, without any time scale or immediate urgency:

* Documentation. There is some, but the presentation is nowhere near as
digestable as it should be.

* Documentation apart, the biggest single todo is Unicode processing.
LaTeX and Postgresql support UTF-8 so that is what it is most likely to
be. My excuse for not having looked at it yet ... need to date, and not
having configured my environment for it.  I do however recognise this as
a need.

* Getting the Sqlite module working again. Similar to the Postgresql
module, fell out of maintenance, when I found Sqlite to be a bit of a
pain to install on Debian, (and was prioritising Postgresql), once upon
a time the modules were in sync, and I hope to have them that way again
someday.

* Much code cleaning ... this project has developed over several years,
and there have been many changes in how things are done, without
rigorous removal of dead code. 

* simplify installation, and test across other Unix and Gnu/Linux
platforms.

* object citation numbering is currently done only for substantive text
and other objects (such as images), a secondary numbering will
eventually be implemented for non-substantive items.

* decide what to do with images and tables in XML and in relational
database.

* Marshalled/PStored Metaverse. As an alternative (not replacement) to
the current ordinary text based SiSU meta-markup state.

* Additional Syntax hi-lighters. The current syntax hi-lighter, and
folds are for vim. Additional syntax highlighters for SiSU markup would
be extremely welcome, they don't need to be as complete as the vim
highligter. Emacs would obviously be nice, but the ruby editors, and
less geeky editors are of much interest. Not sure that I will do this,
after all I do use Vim, we'll see.

* My vim configuration files are a total mess, but are provided as is.

Help/suggestions welcome.