--------------030408050505030807000305
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi again,

I wrote:
> Hi all,
> 
> Here's my solution for the quiz No 74. It generates text with a first 
> order Markov Chain. Because the matrix of a Markov chain for native 
> language tests is sparse, the chain is stored in a hash of hashes for 
> better memory usage.

Now i implemented higher order Markov Chains as well. I could not resist 
:-)

A first order chain, for example the following one

     @mc3  arkovChain.new()
     @mc3.add(1,2)
     @mc3.add(3,4)

is stored as the following hash of hashes

	{12}, 34}}

A higher order chain, for example the following

     @mc7  arkovChain.new(order  )
     @mc7.add_elems([20,21,20,22,21,22,20,23])

is also stored as a hash of hashes, but the keys are arrays

{[22, 21]22}, [22, 20]23}, [20, 21]20}, [21, 
22]20}, [20,22]21}, [21,20]22}}

Because i am new to Ruby, i had to learn a few things about Arrays, 
Objects and dup(). I still do not understand this fully (perhaps it is 
to late here 00:46 CET), but in the method MarkovChain.add() i need to 
dup the arrays, otherwise some keys will get overwritten.

In class MarkovChain
   # Add an edge from node a to node b.
   #
   def add(a, b)
     # this dup is needed, otherwise elements will be overwritten
     a  .dup() if a.instance_of?(Array)
     @absolute[a] || }
     @absolute[a][b] || 
     @absolute[a][b] + 
     @sum_edges[a] || 
     @sum_edges[a] + 
     @dirty[a]  rue
   end

I will try to find it out tommorow. By the way, below is an example text 
i generated for order 3 after i fed 4 books of Chesterton into the 
chain. Beginning with order 3 i think the text looks like a patchwork or 
collage of some phrases and sentences of the original texts.

Best regards,

Joern

. Up to a certain point it was a wordless clamour startlingly close, and 
loud enough to be distinct if each word had not killed the other. No, 
replied Fisher. The Renaissance nobles of the Tudor time were like that. 
It is awful, I think it goes far enough! said Flambeau; but my 
information is fragmentary, and only twelve members of the Central 
Anarchist Council. The gentleman who has for some time past played, with 
propriety and general applause, the difficult part of Thursday, has died 
quite suddenly. Consequently, he was willing to go into Parliament as a 
yeoman or a gentleman or a Jacobite or an Ancient Briton, I should say 
that, Crane went on. This also tempted him, that in a more partial and 
also a more premature fashion, for his voice was colourless and sad. 
What I did next does matter: I gave him a rather wild stare, March had 
yet another unexpected emotion, for his guide hailed the man as Hoggs 
and introduced him as Sir Howard Horne, then introducing his so-called 
Socialist budget, and prepared to expound it in an interview with so
promising a penman. Harold March was to have the gold of Glengyle. So 
far the crime seemed clear enough; and while Boyle was looking into it 
he heard a thud behind him, and then a sudden stillness, as of a stone 
statue walking. He might have been firebrands. The mutinies simmered 
down; the men who must have known that particular house to be so 
accurately inaccurate. But what makes you think it a specimen. Put the 
same feather with a ribbon and an artificial flower and everyone will 
think it a wildcat, though it may have been a clever fellow, answered 
the priest. As they raced along to the gate out of which poured and 
rolled, not Roman, but very late, and that this, my successful
masquerade, might be of considerable value to the public, placed it here 
with his own pale blue ones, and said, Do you want me to tell you all 
you want to guess what he doing, keep behind him. About his life and 
fortune on the table, and went below the surface in a way that at once 
puzzled and reassured him. On the oranges was the equally clear and 
exact description, Finest Brazil nuts, 4d. a lb. M. Brun had a dark, 
handsome lady, of no little majesty, and rather like a mosaic palace, 
rent with earthquakes; or like a Dutch tulip garden blown to the stars. 
The other did not answer; he was free in free London, and drinking

-- 
Joern Dinkla, http://www.dinkla.net



--------------030408050505030807000305
Content-Type: application/x-gzip;
 nameoern_dinkla_quiz_74_ver2.tar.gz"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
 filenameoern_dinkla_quiz_74_ver2.tar.gz"

H4sICDzeOkQCA2pvZXJuX2RpbmtsYV9xdWl6Xzc0X3ZlcjIudGFyAO08bXvbOI75evoVrNKd
2FtZseS3Pumk3bRNO52bNrNpuv3g+vzQNmNrI0tevdTJNL3ffgBI6sVvcWd6ne6dOR1bIgEQ
BAEQBOkEIrFHXnDlcztKBzf2v1LvN7vTPDw/PXn++tROrpO9P1zqUNrN5l5dv6vvRrPtaBin
3nTaLSjNxp7TbLUajLH63jcoaZzwCFgMwkE4ulkPd1f7v2m5mHgxG3mRGCZhdMOGYZBwL4jZ
FJ9Hgn0UUeyFAXPZZRixZCLYOagJ+zuoCXsTsk6TGRU2SZLZ0eHhfD4nJSIdGobTw6ptGK8S
NhaBiHgiYpaI64TNvWTCXvPoKvzInk2ot/CS8eCGhdFIRDZ7KoY8jQX1NuVJ5F0TADMU0hCR
iJ+AJ95HwXwejFM+BgwRJzGDEcUzHsXCIhISHCthiGLE4AVoTXg8QbL4DZwhtYFIEhGxqZii
KNIYKMIALiYiEozD/3E4FSycCSQWp4DkAX2wHYsN0gReRt6IBWECJEfUceIBfBIybzrzxVQE
iQG1U+M/arqnmYig3ykPhgJZQRw5RBLLfsSDUaUK/CSTcIRo8I5wwzCNpHgUY5z50C20FJkj
MUcinsHUIhcgqzTiPjMYlUxmsygcijj2grESF077KAQKOBZoHaXA3jjiU5gLbwgUPnIfRqro
xDAuAQOILYBn/0rDRD2CAky5xUQyRC1AQkQDJ4D48QWPAqlNOC4ppyiE6ZS6Bwyx+MrzfSA3
UBrhsTDwb3DyAMy1GmwuxBWpj7gGYXrIiBw40rXZKzYI0/EkoYGZvyoWkDK2m9AaXjEDOHiO
BC8m4ZTHwLHNuG8TV+gcEkHcRYJL1ueh6pbYhrcRv4GXcUgVhgeaIwcppwW79nmcsOZDNsGZ
Q7ZIbwozSKoFOpvEWhFoGow5QeRwSnHi0E8TMEuAjlAd/BG06EYQBUx7rDUdlBkImWiYXjwx
2Zzf0IQQlz/zj5zY/lVEvmIYsDKeb2CS5EA9kKvvI5acsyCcA2ScoFQM4BkJ4MiGPABRgQ0N
0JRuChOjVFIZXG560J9EJl80hxkw+PxqzqORUnjQL2BtBiZhsxOsAeOPPJA642iDeo6Uzkg/
BZTQDUGzEaejkUC1gRn0vd9wNicc7RXtNAgDZXtenJl1oNTnPYzxXuaOgNQVjhqFc5kGQ5wA
sIbMkH7i8ZUAEXmMT6V0UacTdH1hBEo85bMD+PL5dDDiBzBKNA0lW3ByEp460jjUOPDD4ZWe
FGVF2uNINfKUQQD6WCRywgv2QMqjxwQzf8luwpQRBjpdtFOwYeBMTTFo0tjnSueHwiM4nIt4
LmgJmAKV0+sk4kOp3TwaTgDKMO4zmAp2/dsl+2cooqAvA4s+TkS/07Sh0R7/ZhjPOPRSGA2R
n0VekCjNB8cBNb4HUxPfwIJ0bSPtIczWmmjF9wYAgFUgZS+oTcmL1sjz29GA1WoT4c8M4yms
DzCoMehWbBnGz8gle04ELfYjMf03RR+6emyxn38I06n/KIMyjP9j6/8GiR4uiTIGWX7N+K+9
HP+5jc6e02k32rv475uU/XuHaRwdgo/h/uHACw7JiGpzw4gE6AJ4jgPwKeEs8cNgfJBXYix1
U1ORXRgVWooKU6ieQ3AXHxjGPvsVLR0WgFSaO1h4OIu92Db2jZG4zN4rVYgwZilAmqsNG7WR
daVp9/AhSHHR6YeXfeqMvbmtBewNNc29ESw9729rc/aeKiCuHYSxuK19pFeKPNnZbe2MhT0K
Uy89X8SmIYIR8gyMgmNOx7CKjP1wQEFQ5PEBwNC6TO4eVx5yY7gM6eGRj4Pl2Pjl7GX/xatf
Tvu4teq/efeaHTMHLIMa3p+dP3+bV2KXJTcIE4BRuHH/H6fnT8/engJY4PnGfcB4enreP3sh
KSjs++9fPb/4CV46D437UH96jg2GAVRieHpJE/oLTCh4uXkFxNw1pRhNi5m1CXzmIEdHb876
J+cv370+fXPBepaCXpA1IWIlPQQLFM5P//7u1fnp8xV0SPASGx7pIdwem6aVkObbI6mpJ7SP
m8ZqVKXIbMGHEwiJ2S28WTDf41sgNeQxzQs8QpQWMC1CiotLSsxgNfaSSr1aAF2UH4GtmM5X
EF2PRVSBTovoegwSLVeKJEpFAUzKRwJpnVhDUU6EBNUqswAqMLwlgyDjAACQ1T8Mw7tkVGH7
IhgnE9iyHB+zOiC8vQA65zYaccU8jaIwOmKvPdproBRTjDtMpFwSFwnLqWrTQ7uHL4icaQN3
wzKnY8Rj4OEtVr7UdaTR0pqP1UCqgH0pwD512CG5BzvF8EyOJZ9ifMXpjcc2H436+FrBD+KH
MbJM7g9THza02sAHfOD5XuKJGGVRySckd2HPFA4OvYTBbNs2pazO3l3Yl34agwRp8MDCdGhD
DAa4fWAeqqF3vZuWzhMHv7ZTJRXoc0MnJNNjHC89VhaVkDrNIzTY16Bro30rbDdGhheMxDXD
CacdAz7A7hviTnyaT0B0TIL8KLnNtAT42c+3CtmWAyJyhjbBhNo0M6bpSTo0WiLVpQqLOT12
/N/ssMuso0f3nti9Q1Z9BGgIKFFqkjSopTIDaZYR98CGTbnDh/ify4590E3JkQy6pRXJfjUd
YumBowiRuCVLCGDb1NyjNi0fqjLk1pvkJEf1ABwzmZahQR+gr8aKJao1p2d80/ivFEV/9fxf
y3WbC/Ffu91y9+ChUd/Ff9+kZPHZLBJJclMjI6cwDf6dlNJ0eXKQYzpoNoG1wwOvKUZjFQX5
fABGO5L7/JKXg+AOtvdC4eW5ONjAYwZFZgIshptxnWfkA8pyFB1sAUTGZOA+fJkCLPdGfQXh
SGTpFNnxMIwwIxbKfBPWX4mbDEZygXGoZLaUf0OmwTuA5dLCf1YJqhbxGk9xXfE9zDyqLAIL
VU5QMkr+REFAiM3B38YCN83Zfltlbs4q/pgFsF4ZxhAsPi5mA2HlYQw8btTHRBQAH9EqB5Xk
R18FMHJKb8SlsUAb/I9xtZdBZOujI73X3/S7pghVmfSP2afPsiqTdV4Vp9O+nP68buRFyY1+
l36NOPxDU5IPg0iUebQRo1Itd3cywpQFKSe7jMIpIYLmYiYUnwZForDSVzhMmZTHvpzHUTrD
uQwgdBAjUDzMws1xwVALU5xNXwjh2DzyYA4DIsBh+NwGfNAeWDW4jYkyzPJCvPekchJF/KZa
HkKX99jtLQqNLTV0B7KtvrKFVovyZGhi9RX1OTjNE1ZlUSMFXEp2aOSosZRb1eOdcY8ksCC6
PgLEFfqU46Kl0sF1UulWVWWLudq4yDfCoNALBIXB1+BWtbB8SiQxnpGQRAaaBM04kopFoY9u
b4suJOFS1LrAbomRYp0dT7zLpJLXaT5k6wzjqwwj45G+MynrkBCdp4pMMxMbptN0pWcrSL4U
GC5aaG4cpIbldpJHH2eTRMJgeHQcokUjKVc4yT5TlIUxKDsrjOJ3jgD6IdKAgTGjXS8zK1W0
qPaFIcWEU1bxvzJnBY3FMQ8sPMdIhR6z7B0/H4BWUBM7JPp6GovU0PII2ihOsBSVPQKlS4Qe
VhGtLLqX2bkYp7UGnFScDvEwBpaV8FJ7LD4EH0mpbeUgy/sHOkcRI0mSPB3CiGuOp06UyijL
HBc1xdqGeSa3BZb6BDxJWY4rKxd3fpJ6kkZBZu5godjLOiynjFWCQ//erfcWDX0/W285ZUl4
pNdVTNNTELK0Yl+cPT9jfjiGl/JarGjKiT9m/ylAqL4tQwCtIvDijWjCjssMxjDjw4R9Avdi
0fTc0id7rOh9Lg8uJwSjygZW9A44U0nY1/kDWCFTP8n9mnr/8UdmFkIEc2FZXNB4kOKSnWdk
9j9Bsy2DwL7s+fMRO3n69ojtf8qND4B6C1CwuTo//YXAMpGsADNLliJ7tv8ZekHF/BCYevGm
fc/ernwX+f/iluB/Y//nuM12ef/XbNTdBuz/2s4u//+N8v+G3mu88fxn+KC8T8l6yWZN8OSm
slOyVYn3wrsOaB1cjZU9LqL5IU++GOttAso4/gI0jUhBt7EWUfoj9Otds2tKJwtO/bIYOYqy
24wxlu4yUfZzFjMtk/XYCm8X27NQ7Qh0RbYAPlbrX4m02QNKJQLSX5qFrU5BNj+Ba99uhJ++
mxF+3m6EOz/9Z/j/hTO+370ErD3/bXXchfyf03Hqe06nUe/s/P+38f/MWHd0SwnAbPZlrgse
KL/2wlM71tKVPAq56dYepcxi2jvj8QugZocpVTpMyE4xuDrVWcCVtZUqZuO06y+e8yxlw6ZD
a31K7M48mMq6TIeUOCnE1KXzJPpekzMrJ0/o2IAOodWerUByIe9EYDkDdp5QkS0rEjMoxxWE
F9My+dkVy08pzHMQGN2TQiL7n/DrszrcKB0hMTWAY/Yev226UVk8D8tSKAtoxY7w3tH+JyKk
loLPkqxtlmBfBbGI6HBMdkr72BKHCnzhBKu0+14vvOKGW6gdd0HtAtVrxbRNubzR1axhmMqz
fK6PhkiQWsRSQwNLa1RfywviJMUZcgIVWtmXgCWYPEvL82drUgSS4WyHX0rNg+KGl0xKGIaD
FiaT83Q5Tt2840SylGJALopqU+S0wKEvrr1hGOC2F4RMKdhKWUdk/oJGVVHQFAfIycacQUki
ah/r2OksCXEvjuGHp8OPQFwnREp1SLvwAjs4DA/zI/K8TmbmRxZL+JXIcyl0UqClpXgyssxf
1kfO3F3jyVKKWQaCKBXznIUcYaYNuqNCmxQD7LsJKE8iFikvQi2lHFUXMtmY9bIqJZk/kb16
7C9s4dLJMauzH35YtuSFk2QxAnv0MhPOh7vCLBfznmo8Zf1+Qaov/aVOMxf1cdE0mLxCfqzy
ujJaZGbeZI/jdHCvcviBVbq2de/Jo6MDs9arfmCHFjv44LCDHLToWM8pMVNIxFVU8rtKY61Q
58UU2krdUC9d0lf1otxetUcj/u5i2k3xn5zlP7DxvyP+c+udzkL81+g03T2ntYv/vl38d8Jo
ZZcBHqWLcdFe+GVFLM86dVYZVBrsY4Jx3VjgDxhseWB8KtuP4JEx8yLEwzEo+EuNkC79QxkI
WPvxiFWfnlH3I3kfR14nxq06QnSBBC7IA7qsFdI9MSCEX0mhwYbtrI4SKVZZCg61my3Zeh6j
0a9LMMwoup6VUU/ApzryYTpnzf186Vfr2gukBQKjvXWGR4scJqr1OjdfiK4q2FgtLIK0gsbF
FSrvDDbQc710eHSux1Y5+YUrh0uOni14esYobiM3j+x8uZvPWFzlYBeEjj2sEXpBGHhNaObj
bRmIxrFaelvE1e4++hCghz/QyQ599IDCIEAVG2XnExAIiZnagoAe4s9kICzDjQzeaxdRzAY3
bAD6fxUv9VbpWo/uPTmye1Xsk+XLCt1nwp+wpAHdCF2mukSr++D4Q4LsV6pmD8hlq1lpeLUH
1FRb1XjQ/a+EME3NRDzzPfXDE7w8SpaVRxpaV4kGgQIXrPfgsFo4x5jfsrmW2WPQmc856i5V
//8g/4O/pDtMhv2vcQdsbf6/3mq75fW/2W639pxOy3F36/83yv9n2R+a8hS2pwcbMkKwKOE9
WPqdDi24MmqIcZ+8fGlpcR3W54gWO9JnhfCYnd7DMx1Cq6u/it4FUC9ehPqRao6O3gGrR0f4
/IzDvknvysH1ZxcepkPYnC2lc7JGd2MjJhMqjuXmuZnGJvjGCniqaljNApA6c29kVc1NRJvL
RJtLRDVQY6lmM8zSBRLd1GLrc2BuRqBFJLvAXM9ijcVq12pAdWsZGquBL13f3qqv9uq+2qv7
KkE3l6sbC9VqTghhRTWSLwuosxXTnUI6quvWLRfow6dLDy49N3oF6LXz8XBDd7kwHha7a9St
BgwUWG9Y+XOzV4AuXywvbMvp8gGYVb+QKtXXbQZ9aokxq1pB+7JYPftXzFUW6SBfYFBribgW
k3Tg09lMpLGWSMNiko57J5HmWiJNQpf/7hhOay2RFqgoDaehdHU9kfZaIm1Cd+lz43BowoFU
Zy2pjsXa+l99G1IP15J6CIaj/9XzXGVZbbSXV1TAh4so6cOSwv3Kp88WuWVbA62CcY4ff3KP
HzufJbC7HTCICh6bOV5jKzwLsEDMzZUkmhtJkMMAhIbEJUcBry316sjXArnW1uQyhu4m2t5M
1EVtdhDHdTVJrKpTVcZ6PYOq6yq0Jbdche6LapwcSFEqTFhnM0faH9FYabANPVjts6hJ96q9
GFXqjrVfo0rqm8nOH5Y6L96nQt3UYcdG3dRA62HcjTBLergJWOlh3XZbqIp1u9OSk2/XV+nj
3by1NsJ0cVaVltXtYldddDZa1aBGa9dGan+Kdm3kaFG7cJCNfJB/RMO0ghX7X141KYTdqF8E
sUYX8C621rHt4Brr4bTSrAXQ2oDUcg1Qb07+pjVrI6EycDsHXpZRFvJvlFMGtVIG7Pgxc7So
tgAlY3K0zO5GaBYRmpsRtBydXIhOLkFHi28rGu4mGu0yjVUreeHOan5DNrszDDu1YRZbBvSD
4YWjuE+ar8DzK9IfIkU81ayqPNCdwOa1uQlYjtjVrhQxnG3hG1vDN4vwjSX4v5bYD+UQcLm1
mj2bfopXkR5XdXf34BWsuy1rzXWsLZNWUmoS6BbDKA+iJbHlBudu7lp6HSEk2v7czaEGb1hO
T3NZiA6zU/NyTInXNvDXFRaDh9EKs4BGPNfXd5r1fTKLmbrOXIE1IKzs9w4FLF23CmtIWJmB
FdGyylV4I8KTl/8LOFSxdFnv3yv/9zXugK3N/zlOq3z+12g4bnPPrTud5i7/9z3k/5b+zMeK
v1ZRztSVr2l9SbIuHjurf86ft8sEjsnp7A8/hvjBsydbvvZ0eikeu5v/QoCb0Xa3p71VqmTB
IRP30+GqLUl256Z8aYgwsjanXqfuVR4w/1lb+eKu7HV5GLapVwNRra76IVcemcl7d8W7FsSJ
rAc2ssgiHIXftWvbla/g/7/GHZC1/r/ttJfuf3Tw9x+us/v9/3fh//Vfbcr9O53Q3+HWyZHI
Q/y1N0kr5sVPpxgNptEA7wJk5S2/vIzCgJXLrzy6YsvFpD/0sRCOdZG0SUEbUqe7I5f4qUjj
I9Ize9klx224tF8lsJOyBmly7962/dKi8YpurHixvFuKfjmlmnvy4/dz8TLiA+YlT74mN/CB
ZAmG6p9sxWCtdmLX8jKwho9GR+LislY7XcNdDYmfZGzVsgXLylatR/hBfxTqCD+AXAZ5uhVX
Jx+Cpx8C+C/6EDxbwwex8BQ/nm1Fk3bUFfrbn3JvXV1DGJspZghoCPoVPkvd/LkL6Gb/H/e/
wp//W3//r5GF/9L/1yH6d108/3dbO///rX7/d//ITgO6XCzvrdGlWnqCDSxeXKv05S2yPm5q
bbJW3xvAxnbz2qGjh3LVmj8PuLzVPNjFk7uyK7uyK7uyK7uyK7uyK7uyK7uyK7uyK7uyK7uy
K7uyK7uyK7uyK19Y/gcMrwTEAHgAAA--------------030408050505030807000305
Content-Type: text/plain;
 namearkov-chain.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filenamearkov-chain.rb"

require 'pretty-print'

# 
# A Markov Chain contains a graph which edges are labeled with probabilities.
# The graph is stored as two hashes, one for the absolute probabilites, one for
# the relative probabilities. The nodes of the graph correspond to the keys of the hashes.
#
# The rand() method is worst case O(n), for small lists this is ok, but for
# large lists binary search will be better O(lg n)
#

class MarkovChain 

  attr_reader :order
  
  # Initializes the hashes.
  #
  def initialize(order  )
    @order  rder
    @absolute  }
    @relative  }
    @sum_edges  }
    @dirty  }
  end
  
  # The nodes of the graph correspond to the keys of the hashes.
  #
  def nodes
    @absolute.keys()
  end
  
  # Add an edge from node a to node b.
  #
  def add(a, b)
    # this dup is needed, otherwise elements will be overwritten
    a  .dup() if a.instance_of?(Array)
    @absolute[a] || } 
    @absolute[a][b] || 
    @absolute[a][b] + 
    @sum_edges[a] || 
    @sum_edges[a] + 
    @dirty[a]  rue
  end

  # Adds a list of elements pairwise.
  #
  def add_elems(elems)
    if ( 1 @order )
      a  il
      elems.each() do |b|
        add(a, b) if ( a )
        a  
      end
    else
      a  ]
      elems.each() do |b|
        if ( a.length() @order )
          add(a, b)
          a.shift()
        end
        a.push(b)
      end
    end
  end

  # Calculates all the relative cumulative probabilities.
  #
  def recalc_all()
    @relative  absolute.dup()
    @relative.each_pair do | a, hash |
      recalc(a) if @dirty[a]
    end
  end
  
  # Calculates the relative cumulative probabilities.
  #
  def recalc(a)
    cum  .0
    @relative[a]  absolute[a].dup()
    sum  sum_edges[a] * 1.0
    @relative[a].each_pair do | b, value|
      cum  um + ( value / sum )
      @relative[a][b]  um
    end
    @dirty.delete(a)
    @relative[a]
  end
  
  # Generates a random successor of node a according to the probabilities learned 
  # from the example texts.
  #
  def rand(a)
    recalc(a) if @dirty[a]
    if a.nil? || @relative[a].nil? || @relative[a].length() 0
      return nil
    elsif @relative[a].length() 1
      return @relative[a].keys[0]
    else
      # this is a linear search now with worst case O(n), TODO log(n) binary search
      value  ernel.rand()
      candidates  relative[a].select { |b, prob| prob > value }
      return candidates[0][0]
    end
  end

  def to_s()
    result  ]
    result << "MarkovChain"
    @absolute.each_pair do | key, hash |
      result << "#{key.pretty_to_s()}: ABS: #{@absolute[key].pretty_to_s()} - REL: #{@relative[key].pretty_to_s()}"
    end
    result.join("\n")
  end

end

--------------030408050505030807000305
Content-Type: text/plain;
 nametory-generator.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filenametory-generator.rb"

# 

require 'markov-chain'

# A generator for stories. Fill the Markov chain with the methods add()
# or add_file() and generate a story with the method story().
#
class StoryGenerator

  attr_reader :mc, :order
  
  # Initializes.
  #
  def initialize(order  ) 
    @mc || arkovChain.new(order  rder)
    @order  rder
  end

  # Adds the words to the MarkovChain
  #
  def add(words) 
    @mc.add_elems(words)
  end

  # Adds a file to the MarkovChain.
  #
  def add_file(file) 
    puts "Reading file #{file}" if ( $VERBOSE )
    words  ords.parse_file(file)
    if ( $VERBOSE )
      puts "Read in #{words.length} words."
      puts "Inserting words from file #{file}"
      STDOUT.flush()
    end
    @mc.add_elems(words)
  end
  
  # Genereates a story with n words (".", "," etc. counting as a word) 
  #
  def story(n, initial_words  il)
    elems  enerate(n, initial_words)
    format(elems)
  end
  
  # Generates a story from the Markov Chain mc of length n and which starts with a
  # successor of word.
  #
  def generate(n, words)
    lexicon  mc.nodes()
    words  andom_word(lexicon) if words.nil?
    elems  ]
    1.upto(n) do |i|
      next_word  mc.rand(words)
      # if no word is word, take a random one from the lexicon
      if next_word.nil?
        words  andom_word(lexicon)
      else
        if 1 @order 
          words  ext_word
          elems << words
        else
          elems << words.shift()
          words.push(next_word)
        end
      end
      if ( i % LOG_WORDS_NUM 0 && $VERBOSE )
        puts "Generated #{i} words." 
        STDOUT.flush()
      end
    end
    elems
  end
  
  # Formats the elements.
  #
  def format(elems)
    text  lems.join(" ")
    text.gsub!(/\ ([.,!?;:'"-])\ /, '\1 ')
    text
  end

  # Returns a random (list of) word(s)
  #
  def random_word(lexicon)
    lexicon[rand(lexicon.length)]  
  end
  
end

--------------030408050505030807000305
Content-Type: text/plain;
 nameords.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filenameords.rb"

# A parser for texts in native languages, for example english or german.
# 
# Example:
#   "To be    or not to    be."
#
# will be parsed into the array
#
# ["To", "be", "or", "not", "to", "be", "."]
#
class Words

  attr_reader :words
  
  # Returns the words of a file.
  #
  def Words.parse_file(filename)
    i  
    all_words  ]
    File.foreach(filename) do |line|
      ws  ords.parse(line)
      next if ws.nil?
      all_words + s
      i +  
      if ( i % LOG_FILE_READ_NUM 0 && $VERBOSE ) 
        puts "  Read #{i} lines." 
        STDOUT.flush()
      end
    end
    all_words
  end

  # Returns the words of a line.
  #
  def Words.parse(line)
    # replace newline
    line.gsub!(/\r\n/, '')
    return nil if line.length 0
    # seperate all special characters by blanks
    line.gsub!(/([,;!?:.])/, ' \1 ')
    # remove unused special characters
    line.gsub!(/[+\r\n()"]/, " ")
    line.gsub!(/\-+/, " - ")
    line.gsub!(/'[^t]/, "")
    # split the line into words
    words  ine.split(/[ ]+/).select { |w| w.length > 0 }
    words
  end

end

--------------030408050505030807000305
Content-Type: text/plain;
 nameretty-print.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filenameretty-print.rb"

#


class NilClass
  def pretty_to_s()
    "nil"
  end
end

class Fixnum
  def pretty_to_s()
    to_s()
  end
end

class Float
  def pretty_to_s()
    to_s()
  end
end

class String
  def pretty_to_s()
    to_s()
  end
end



class Array

  def pretty_to_s()
    results  "["]
    self.each() do |e|
      results +  e.pretty_to_s(), "," ] 
    end
    results.pop() if results.length() > 1
    results +  "]" ]
    results.join("")
  end
  
end

class Hash

  def pretty_to_s()
    results  "{"]
    self.each() do |e|
      results +  e.pretty_to_s(), "," ] 
    end
    results.pop() if results.length() > 1
    results +  "}" ]
    results.join("")
  end
  
end

--------------030408050505030807000305
Content-Type: text/plain;
 nameain-markov-chains.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filenameain-markov-chains.rb"

#!/usr/local/bin/ruby -w

require 'getoptlong'
require 'story-generator'
require 'markov-chain'
require 'words'

# Prints out the synopsis.
#
def synopsis()
  puts "ruby main-markov-chains.rb [--help] [--number_of_words N|-n N] [--width W|-w W] [--verbose|-v] [--order O|-O o] textfiles"
end

# these ugly global variables are used for printing out the progress
LOG_FILE_READ_NUM  000
LOG_WORDS_NUM  00

# command line option 
$VERBOSE  il
$NUMBER_OF_WORDS  00
$WIDTH  8
$ORDER  

opts  etoptLong.new(
  ["--help", "-h", GetoptLong::NO_ARGUMENT ],
  ["--number_of_words", "--num", "-n", GetoptLong::REQUIRED_ARGUMENT ],
  ["--order", "--ord", "-o", GetoptLong::REQUIRED_ARGUMENT ],
  ["--width", "-w", GetoptLong::REQUIRED_ARGUMENT ],
  ["--verbose", "-v", GetoptLong::NO_ARGUMENT ]
)

opts.each do |opt, arg|
  case opt
  when "--help"
    synopsis()
    exit(0)
  when "--number_of_words"
    $NUMBER_OF_WORDS  nteger(arg)
  when "--verbose"
    $VERBOSE  rue
  when "--width"
    $WIDTH  nteger(arg)
  when "--order"
    $ORDER  nteger(arg)
  end 
end

files  RGV

if files.length() 0
  STDERR.puts("Error: Missing argument")
  synopsis()
  exit(1)
end

# main
# our story generator
sg  toryGenerator.new(order  ORDER)
# feed all the files into it
files.each do |file|
  sg.add_file(file)
end  

# calculate the probabilities
if ( $VERBOSE )
  puts "Calculating probabilities ..."
  STDOUT.flush()
end
sg.mc.recalc_all()

# generate the story
if ( $VERBOSE )
  puts "Generating..."
  STDOUT.flush()
end
story  g.story($NUMBER_OF_WORDS)

# and print it out formatted
index  
last  
space  
while index < story.length()
  # remember the last non word element
  space  ndex if ( story[index, 1] /[ ,:;!?.]/ );
  if (index - last $WIDTH )
    raise "There is a word larger then the width" if ( last space+1 )
    puts story[last..space]
    index  pace
    last  pace + 1
  end
  index + 
end
puts story[last..-1]

--------------030408050505030807000305
Content-Type: text/x-vcard; charset=utf-8;
 nameoern.vcf"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filenameoern.vcf"

begin:vcard
fn;quoted-printable:Jörn Dinkla
n;quoted-printable:Dinkla;Jörn
adr;dom:;;;Hamburg
email;internet:joern / dinkla.net
tel;cell:+49(0)179 70 10 60 5
x-mozilla-html:TRUE
version:2.1
end:vcard


--------------030408050505030807000305--