--------------030408050505030807000305 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi again, I wrote: > Hi all, > > Here's my solution for the quiz No 74. It generates text with a first > order Markov Chain. Because the matrix of a Markov chain for native > language tests is sparse, the chain is stored in a hash of hashes for > better memory usage. Now i implemented higher order Markov Chains as well. I could not resist :-) A first order chain, for example the following one @mc3 arkovChain.new() @mc3.add(1,2) @mc3.add(3,4) is stored as the following hash of hashes {1 2}, 3 4}} A higher order chain, for example the following @mc7 arkovChain.new(order ) @mc7.add_elems([20,21,20,22,21,22,20,23]) is also stored as a hash of hashes, but the keys are arrays {[22, 21] 22}, [22, 20] 23}, [20, 21] 20}, [21, 22] 20}, [20,22] 21}, [21,20] 22}} Because i am new to Ruby, i had to learn a few things about Arrays, Objects and dup(). I still do not understand this fully (perhaps it is to late here 00:46 CET), but in the method MarkovChain.add() i need to dup the arrays, otherwise some keys will get overwritten. In class MarkovChain # Add an edge from node a to node b. # def add(a, b) # this dup is needed, otherwise elements will be overwritten a .dup() if a.instance_of?(Array) @absolute[a] || } @absolute[a][b] || @absolute[a][b] + @sum_edges[a] || @sum_edges[a] + @dirty[a] rue end I will try to find it out tommorow. By the way, below is an example text i generated for order 3 after i fed 4 books of Chesterton into the chain. Beginning with order 3 i think the text looks like a patchwork or collage of some phrases and sentences of the original texts. Best regards, Joern . Up to a certain point it was a wordless clamour startlingly close, and loud enough to be distinct if each word had not killed the other. No, replied Fisher. The Renaissance nobles of the Tudor time were like that. It is awful, I think it goes far enough! said Flambeau; but my information is fragmentary, and only twelve members of the Central Anarchist Council. The gentleman who has for some time past played, with propriety and general applause, the difficult part of Thursday, has died quite suddenly. Consequently, he was willing to go into Parliament as a yeoman or a gentleman or a Jacobite or an Ancient Briton, I should say that, Crane went on. This also tempted him, that in a more partial and also a more premature fashion, for his voice was colourless and sad. What I did next does matter: I gave him a rather wild stare, March had yet another unexpected emotion, for his guide hailed the man as Hoggs and introduced him as Sir Howard Horne, then introducing his so-called Socialist budget, and prepared to expound it in an interview with so promising a penman. Harold March was to have the gold of Glengyle. So far the crime seemed clear enough; and while Boyle was looking into it he heard a thud behind him, and then a sudden stillness, as of a stone statue walking. He might have been firebrands. The mutinies simmered down; the men who must have known that particular house to be so accurately inaccurate. But what makes you think it a specimen. Put the same feather with a ribbon and an artificial flower and everyone will think it a wildcat, though it may have been a clever fellow, answered the priest. As they raced along to the gate out of which poured and rolled, not Roman, but very late, and that this, my successful masquerade, might be of considerable value to the public, placed it here with his own pale blue ones, and said, Do you want me to tell you all you want to guess what he doing, keep behind him. About his life and fortune on the table, and went below the surface in a way that at once puzzled and reassured him. On the oranges was the equally clear and exact description, Finest Brazil nuts, 4d. a lb. M. Brun had a dark, handsome lady, of no little majesty, and rather like a mosaic palace, rent with earthquakes; or like a Dutch tulip garden blown to the stars. The other did not answer; he was free in free London, and drinking -- Joern Dinkla, http://www.dinkla.net --------------030408050505030807000305 Content-Type: application/x-gzip; name oern_dinkla_quiz_74_ver2.tar.gz" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename oern_dinkla_quiz_74_ver2.tar.gz" H4sICDzeOkQCA2pvZXJuX2RpbmtsYV9xdWl6Xzc0X3ZlcjIudGFyAO08bXvbOI75evoVrNKd 2FtZseS3Pumk3bRNO52bNrNpuv3g+vzQNmNrI0tevdTJNL3ffgBI6sVvcWd6ne6dOR1bIgEQ BAEQBOkEIrFHXnDlcztKBzf2v1LvN7vTPDw/PXn++tROrpO9P1zqUNrN5l5dv6vvRrPtaBin 3nTaLSjNxp7TbLUajLH63jcoaZzwCFgMwkE4ulkPd1f7v2m5mHgxG3mRGCZhdMOGYZBwL4jZ FJ9Hgn0UUeyFAXPZZRixZCLYOagJ+zuoCXsTsk6TGRU2SZLZ0eHhfD4nJSIdGobTw6ptGK8S NhaBiHgiYpaI64TNvWTCXvPoKvzInk2ot/CS8eCGhdFIRDZ7KoY8jQX1NuVJ5F0TADMU0hCR iJ+AJ95HwXwejFM+BgwRJzGDEcUzHsXCIhISHCthiGLE4AVoTXg8QbL4DZwhtYFIEhGxqZii KNIYKMIALiYiEozD/3E4FSycCSQWp4DkAX2wHYsN0gReRt6IBWECJEfUceIBfBIybzrzxVQE iQG1U+M/arqnmYig3ykPhgJZQRw5RBLLfsSDUaUK/CSTcIRo8I5wwzCNpHgUY5z50C20FJkj MUcinsHUIhcgqzTiPjMYlUxmsygcijj2grESF077KAQKOBZoHaXA3jjiU5gLbwgUPnIfRqro xDAuAQOILYBn/0rDRD2CAky5xUQyRC1AQkQDJ4D48QWPAqlNOC4ppyiE6ZS6Bwyx+MrzfSA3 UBrhsTDwb3DyAMy1GmwuxBWpj7gGYXrIiBw40rXZKzYI0/EkoYGZvyoWkDK2m9AaXjEDOHiO BC8m4ZTHwLHNuG8TV+gcEkHcRYJL1ueh6pbYhrcRv4GXcUgVhgeaIwcppwW79nmcsOZDNsGZ Q7ZIbwozSKoFOpvEWhFoGow5QeRwSnHi0E8TMEuAjlAd/BG06EYQBUx7rDUdlBkImWiYXjwx 2Zzf0IQQlz/zj5zY/lVEvmIYsDKeb2CS5EA9kKvvI5acsyCcA2ScoFQM4BkJ4MiGPABRgQ0N 0JRuChOjVFIZXG560J9EJl80hxkw+PxqzqORUnjQL2BtBiZhsxOsAeOPPJA642iDeo6Uzkg/ BZTQDUGzEaejkUC1gRn0vd9wNicc7RXtNAgDZXtenJl1oNTnPYzxXuaOgNQVjhqFc5kGQ5wA sIbMkH7i8ZUAEXmMT6V0UacTdH1hBEo85bMD+PL5dDDiBzBKNA0lW3ByEp460jjUOPDD4ZWe FGVF2uNINfKUQQD6WCRywgv2QMqjxwQzf8luwpQRBjpdtFOwYeBMTTFo0tjnSueHwiM4nIt4 LmgJmAKV0+sk4kOp3TwaTgDKMO4zmAp2/dsl+2cooqAvA4s+TkS/07Sh0R7/ZhjPOPRSGA2R n0VekCjNB8cBNb4HUxPfwIJ0bSPtIczWmmjF9wYAgFUgZS+oTcmL1sjz29GA1WoT4c8M4yms DzCoMehWbBnGz8gle04ELfYjMf03RR+6emyxn38I06n/KIMyjP9j6/8GiR4uiTIGWX7N+K+9 HP+5jc6e02k32rv475uU/XuHaRwdgo/h/uHACw7JiGpzw4gE6AJ4jgPwKeEs8cNgfJBXYix1 U1ORXRgVWooKU6ieQ3AXHxjGPvsVLR0WgFSaO1h4OIu92Db2jZG4zN4rVYgwZilAmqsNG7WR daVp9/AhSHHR6YeXfeqMvbmtBewNNc29ESw9729rc/aeKiCuHYSxuK19pFeKPNnZbe2MhT0K Uy89X8SmIYIR8gyMgmNOx7CKjP1wQEFQ5PEBwNC6TO4eVx5yY7gM6eGRj4Pl2Pjl7GX/xatf Tvu4teq/efeaHTMHLIMa3p+dP3+bV2KXJTcIE4BRuHH/H6fnT8/engJY4PnGfcB4enreP3sh KSjs++9fPb/4CV46D437UH96jg2GAVRieHpJE/oLTCh4uXkFxNw1pRhNi5m1CXzmIEdHb876 J+cv370+fXPBepaCXpA1IWIlPQQLFM5P//7u1fnp8xV0SPASGx7pIdwem6aVkObbI6mpJ7SP m8ZqVKXIbMGHEwiJ2S28WTDf41sgNeQxzQs8QpQWMC1CiotLSsxgNfaSSr1aAF2UH4GtmM5X EF2PRVSBTovoegwSLVeKJEpFAUzKRwJpnVhDUU6EBNUqswAqMLwlgyDjAACQ1T8Mw7tkVGH7 IhgnE9iyHB+zOiC8vQA65zYaccU8jaIwOmKvPdproBRTjDtMpFwSFwnLqWrTQ7uHL4icaQN3 wzKnY8Rj4OEtVr7UdaTR0pqP1UCqgH0pwD512CG5BzvF8EyOJZ9ifMXpjcc2H436+FrBD+KH MbJM7g9THza02sAHfOD5XuKJGGVRySckd2HPFA4OvYTBbNs2pazO3l3Yl34agwRp8MDCdGhD DAa4fWAeqqF3vZuWzhMHv7ZTJRXoc0MnJNNjHC89VhaVkDrNIzTY16Bro30rbDdGhheMxDXD CacdAz7A7hviTnyaT0B0TIL8KLnNtAT42c+3CtmWAyJyhjbBhNo0M6bpSTo0WiLVpQqLOT12 /N/ssMuso0f3nti9Q1Z9BGgIKFFqkjSopTIDaZYR98CGTbnDh/ify4590E3JkQy6pRXJfjUd YumBowiRuCVLCGDb1NyjNi0fqjLk1pvkJEf1ABwzmZahQR+gr8aKJao1p2d80/ivFEV/9fxf y3WbC/Ffu91y9+ChUd/Ff9+kZPHZLBJJclMjI6cwDf6dlNJ0eXKQYzpoNoG1wwOvKUZjFQX5 fABGO5L7/JKXg+AOtvdC4eW5ONjAYwZFZgIshptxnWfkA8pyFB1sAUTGZOA+fJkCLPdGfQXh SGTpFNnxMIwwIxbKfBPWX4mbDEZygXGoZLaUf0OmwTuA5dLCf1YJqhbxGk9xXfE9zDyqLAIL VU5QMkr+REFAiM3B38YCN83Zfltlbs4q/pgFsF4ZxhAsPi5mA2HlYQw8btTHRBQAH9EqB5Xk R18FMHJKb8SlsUAb/I9xtZdBZOujI73X3/S7pghVmfSP2afPsiqTdV4Vp9O+nP68buRFyY1+ l36NOPxDU5IPg0iUebQRo1Itd3cywpQFKSe7jMIpIYLmYiYUnwZForDSVzhMmZTHvpzHUTrD uQwgdBAjUDzMws1xwVALU5xNXwjh2DzyYA4DIsBh+NwGfNAeWDW4jYkyzPJCvPekchJF/KZa HkKX99jtLQqNLTV0B7KtvrKFVovyZGhi9RX1OTjNE1ZlUSMFXEp2aOSosZRb1eOdcY8ksCC6 PgLEFfqU46Kl0sF1UulWVWWLudq4yDfCoNALBIXB1+BWtbB8SiQxnpGQRAaaBM04kopFoY9u b4suJOFS1LrAbomRYp0dT7zLpJLXaT5k6wzjqwwj45G+MynrkBCdp4pMMxMbptN0pWcrSL4U GC5aaG4cpIbldpJHH2eTRMJgeHQcokUjKVc4yT5TlIUxKDsrjOJ3jgD6IdKAgTGjXS8zK1W0 qPaFIcWEU1bxvzJnBY3FMQ8sPMdIhR6z7B0/H4BWUBM7JPp6GovU0PII2ihOsBSVPQKlS4Qe VhGtLLqX2bkYp7UGnFScDvEwBpaV8FJ7LD4EH0mpbeUgy/sHOkcRI0mSPB3CiGuOp06UyijL HBc1xdqGeSa3BZb6BDxJWY4rKxd3fpJ6kkZBZu5godjLOiynjFWCQ//erfcWDX0/W285ZUl4 pNdVTNNTELK0Yl+cPT9jfjiGl/JarGjKiT9m/ylAqL4tQwCtIvDijWjCjssMxjDjw4R9Avdi 0fTc0id7rOh9Lg8uJwSjygZW9A44U0nY1/kDWCFTP8n9mnr/8UdmFkIEc2FZXNB4kOKSnWdk 9j9Bsy2DwL7s+fMRO3n69ojtf8qND4B6C1CwuTo//YXAMpGsADNLliJ7tv8ZekHF/BCYevGm fc/ernwX+f/iluB/Y//nuM12ef/XbNTdBuz/2s4u//+N8v+G3mu88fxn+KC8T8l6yWZN8OSm slOyVYn3wrsOaB1cjZU9LqL5IU++GOttAso4/gI0jUhBt7EWUfoj9Otds2tKJwtO/bIYOYqy 24wxlu4yUfZzFjMtk/XYCm8X27NQ7Qh0RbYAPlbrX4m02QNKJQLSX5qFrU5BNj+Ba99uhJ++ mxF+3m6EOz/9Z/j/hTO+370ErD3/bXXchfyf03Hqe06nUe/s/P+38f/MWHd0SwnAbPZlrgse KL/2wlM71tKVPAq56dYepcxi2jvj8QugZocpVTpMyE4xuDrVWcCVtZUqZuO06y+e8yxlw6ZD a31K7M48mMq6TIeUOCnE1KXzJPpekzMrJ0/o2IAOodWerUByIe9EYDkDdp5QkS0rEjMoxxWE F9My+dkVy08pzHMQGN2TQiL7n/DrszrcKB0hMTWAY/Yev226UVk8D8tSKAtoxY7w3tH+JyKk loLPkqxtlmBfBbGI6HBMdkr72BKHCnzhBKu0+14vvOKGW6gdd0HtAtVrxbRNubzR1axhmMqz fK6PhkiQWsRSQwNLa1RfywviJMUZcgIVWtmXgCWYPEvL82drUgSS4WyHX0rNg+KGl0xKGIaD FiaT83Q5Tt2840SylGJALopqU+S0wKEvrr1hGOC2F4RMKdhKWUdk/oJGVVHQFAfIycacQUki ah/r2OksCXEvjuGHp8OPQFwnREp1SLvwAjs4DA/zI/K8TmbmRxZL+JXIcyl0UqClpXgyssxf 1kfO3F3jyVKKWQaCKBXznIUcYaYNuqNCmxQD7LsJKE8iFikvQi2lHFUXMtmY9bIqJZk/kb16 7C9s4dLJMauzH35YtuSFk2QxAnv0MhPOh7vCLBfznmo8Zf1+Qaov/aVOMxf1cdE0mLxCfqzy ujJaZGbeZI/jdHCvcviBVbq2de/Jo6MDs9arfmCHFjv44LCDHLToWM8pMVNIxFVU8rtKY61Q 58UU2krdUC9d0lf1otxetUcj/u5i2k3xn5zlP7DxvyP+c+udzkL81+g03T2ntYv/vl38d8Jo ZZcBHqWLcdFe+GVFLM86dVYZVBrsY4Jx3VjgDxhseWB8KtuP4JEx8yLEwzEo+EuNkC79QxkI WPvxiFWfnlH3I3kfR14nxq06QnSBBC7IA7qsFdI9MSCEX0mhwYbtrI4SKVZZCg61my3Zeh6j 0a9LMMwoup6VUU/ApzryYTpnzf186Vfr2gukBQKjvXWGR4scJqr1OjdfiK4q2FgtLIK0gsbF FSrvDDbQc710eHSux1Y5+YUrh0uOni14esYobiM3j+x8uZvPWFzlYBeEjj2sEXpBGHhNaObj bRmIxrFaelvE1e4++hCghz/QyQ599IDCIEAVG2XnExAIiZnagoAe4s9kICzDjQzeaxdRzAY3 bAD6fxUv9VbpWo/uPTmye1Xsk+XLCt1nwp+wpAHdCF2mukSr++D4Q4LsV6pmD8hlq1lpeLUH 1FRb1XjQ/a+EME3NRDzzPfXDE7w8SpaVRxpaV4kGgQIXrPfgsFo4x5jfsrmW2WPQmc856i5V //8g/4O/pDtMhv2vcQdsbf6/3mq75fW/2W639pxOy3F36/83yv9n2R+a8hS2pwcbMkKwKOE9 WPqdDi24MmqIcZ+8fGlpcR3W54gWO9JnhfCYnd7DMx1Cq6u/it4FUC9ehPqRao6O3gGrR0f4 /IzDvknvysH1ZxcepkPYnC2lc7JGd2MjJhMqjuXmuZnGJvjGCniqaljNApA6c29kVc1NRJvL RJtLRDVQY6lmM8zSBRLd1GLrc2BuRqBFJLvAXM9ijcVq12pAdWsZGquBL13f3qqv9uq+2qv7 KkE3l6sbC9VqTghhRTWSLwuosxXTnUI6quvWLRfow6dLDy49N3oF6LXz8XBDd7kwHha7a9St BgwUWG9Y+XOzV4AuXywvbMvp8gGYVb+QKtXXbQZ9aokxq1pB+7JYPftXzFUW6SBfYFBribgW k3Tg09lMpLGWSMNiko57J5HmWiJNQpf/7hhOay2RFqgoDaehdHU9kfZaIm1Cd+lz43BowoFU Zy2pjsXa+l99G1IP15J6CIaj/9XzXGVZbbSXV1TAh4so6cOSwv3Kp88WuWVbA62CcY4ff3KP HzufJbC7HTCICh6bOV5jKzwLsEDMzZUkmhtJkMMAhIbEJUcBry316sjXArnW1uQyhu4m2t5M 1EVtdhDHdTVJrKpTVcZ6PYOq6yq0Jbdche6LapwcSFEqTFhnM0faH9FYabANPVjts6hJ96q9 GFXqjrVfo0rqm8nOH5Y6L96nQt3UYcdG3dRA62HcjTBLergJWOlh3XZbqIp1u9OSk2/XV+nj 3by1NsJ0cVaVltXtYldddDZa1aBGa9dGan+Kdm3kaFG7cJCNfJB/RMO0ghX7X141KYTdqF8E sUYX8C621rHt4Brr4bTSrAXQ2oDUcg1Qb07+pjVrI6EycDsHXpZRFvJvlFMGtVIG7Pgxc7So tgAlY3K0zO5GaBYRmpsRtBydXIhOLkFHi28rGu4mGu0yjVUreeHOan5DNrszDDu1YRZbBvSD 4YWjuE+ar8DzK9IfIkU81ayqPNCdwOa1uQlYjtjVrhQxnG3hG1vDN4vwjSX4v5bYD+UQcLm1 mj2bfopXkR5XdXf34BWsuy1rzXWsLZNWUmoS6BbDKA+iJbHlBudu7lp6HSEk2v7czaEGb1hO T3NZiA6zU/NyTInXNvDXFRaDh9EKs4BGPNfXd5r1fTKLmbrOXIE1IKzs9w4FLF23CmtIWJmB FdGyylV4I8KTl/8LOFSxdFnv3yv/9zXugK3N/zlOq3z+12g4bnPPrTud5i7/9z3k/5b+zMeK v1ZRztSVr2l9SbIuHjurf86ft8sEjsnp7A8/hvjBsydbvvZ0eikeu5v/QoCb0Xa3p71VqmTB IRP30+GqLUl256Z8aYgwsjanXqfuVR4w/1lb+eKu7HV5GLapVwNRra76IVcemcl7d8W7FsSJ rAc2ssgiHIXftWvbla/g/7/GHZC1/r/ttJfuf3Tw9x+us/v9/3fh//Vfbcr9O53Q3+HWyZHI Q/y1N0kr5sVPpxgNptEA7wJk5S2/vIzCgJXLrzy6YsvFpD/0sRCOdZG0SUEbUqe7I5f4qUjj I9Ize9klx224tF8lsJOyBmly7962/dKi8YpurHixvFuKfjmlmnvy4/dz8TLiA+YlT74mN/CB ZAmG6p9sxWCtdmLX8jKwho9GR+LislY7XcNdDYmfZGzVsgXLylatR/hBfxTqCD+AXAZ5uhVX Jx+Cpx8C+C/6EDxbwwex8BQ/nm1Fk3bUFfrbn3JvXV1DGJspZghoCPoVPkvd/LkL6Gb/H/e/ wp//W3//r5GF/9L/1yH6d108/3dbO///rX7/d//ITgO6XCzvrdGlWnqCDSxeXKv05S2yPm5q bbJW3xvAxnbz2qGjh3LVmj8PuLzVPNjFk7uyK7uyK7uyK7uyK7uyK7uyK7uyK7uyK7uyK7uy K7uyK7uyK7uyK19Y/gcMrwTEAHgAAA --------------030408050505030807000305 Content-Type: text/plain; name arkov-chain.rb" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename arkov-chain.rb" require 'pretty-print' # # A Markov Chain contains a graph which edges are labeled with probabilities. # The graph is stored as two hashes, one for the absolute probabilites, one for # the relative probabilities. The nodes of the graph correspond to the keys of the hashes. # # The rand() method is worst case O(n), for small lists this is ok, but for # large lists binary search will be better O(lg n) # class MarkovChain attr_reader :order # Initializes the hashes. # def initialize(order ) @order rder @absolute } @relative } @sum_edges } @dirty } end # The nodes of the graph correspond to the keys of the hashes. # def nodes @absolute.keys() end # Add an edge from node a to node b. # def add(a, b) # this dup is needed, otherwise elements will be overwritten a .dup() if a.instance_of?(Array) @absolute[a] || } @absolute[a][b] || @absolute[a][b] + @sum_edges[a] || @sum_edges[a] + @dirty[a] rue end # Adds a list of elements pairwise. # def add_elems(elems) if ( 1 @order ) a il elems.each() do |b| add(a, b) if ( a ) a end else a ] elems.each() do |b| if ( a.length() @order ) add(a, b) a.shift() end a.push(b) end end end # Calculates all the relative cumulative probabilities. # def recalc_all() @relative absolute.dup() @relative.each_pair do | a, hash | recalc(a) if @dirty[a] end end # Calculates the relative cumulative probabilities. # def recalc(a) cum .0 @relative[a] absolute[a].dup() sum sum_edges[a] * 1.0 @relative[a].each_pair do | b, value| cum um + ( value / sum ) @relative[a][b] um end @dirty.delete(a) @relative[a] end # Generates a random successor of node a according to the probabilities learned # from the example texts. # def rand(a) recalc(a) if @dirty[a] if a.nil? || @relative[a].nil? || @relative[a].length() 0 return nil elsif @relative[a].length() 1 return @relative[a].keys[0] else # this is a linear search now with worst case O(n), TODO log(n) binary search value ernel.rand() candidates relative[a].select { |b, prob| prob > value } return candidates[0][0] end end def to_s() result ] result << "MarkovChain" @absolute.each_pair do | key, hash | result << "#{key.pretty_to_s()}: ABS: #{@absolute[key].pretty_to_s()} - REL: #{@relative[key].pretty_to_s()}" end result.join("\n") end end --------------030408050505030807000305 Content-Type: text/plain; name tory-generator.rb" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename tory-generator.rb" # require 'markov-chain' # A generator for stories. Fill the Markov chain with the methods add() # or add_file() and generate a story with the method story(). # class StoryGenerator attr_reader :mc, :order # Initializes. # def initialize(order ) @mc || arkovChain.new(order rder) @order rder end # Adds the words to the MarkovChain # def add(words) @mc.add_elems(words) end # Adds a file to the MarkovChain. # def add_file(file) puts "Reading file #{file}" if ( $VERBOSE ) words ords.parse_file(file) if ( $VERBOSE ) puts "Read in #{words.length} words." puts "Inserting words from file #{file}" STDOUT.flush() end @mc.add_elems(words) end # Genereates a story with n words (".", "," etc. counting as a word) # def story(n, initial_words il) elems enerate(n, initial_words) format(elems) end # Generates a story from the Markov Chain mc of length n and which starts with a # successor of word. # def generate(n, words) lexicon mc.nodes() words andom_word(lexicon) if words.nil? elems ] 1.upto(n) do |i| next_word mc.rand(words) # if no word is word, take a random one from the lexicon if next_word.nil? words andom_word(lexicon) else if 1 @order words ext_word elems << words else elems << words.shift() words.push(next_word) end end if ( i % LOG_WORDS_NUM 0 && $VERBOSE ) puts "Generated #{i} words." STDOUT.flush() end end elems end # Formats the elements. # def format(elems) text lems.join(" ") text.gsub!(/\ ([.,!?;:'"-])\ /, '\1 ') text end # Returns a random (list of) word(s) # def random_word(lexicon) lexicon[rand(lexicon.length)] end end --------------030408050505030807000305 Content-Type: text/plain; name ords.rb" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename ords.rb" # A parser for texts in native languages, for example english or german. # # Example: # "To be or not to be." # # will be parsed into the array # # ["To", "be", "or", "not", "to", "be", "."] # class Words attr_reader :words # Returns the words of a file. # def Words.parse_file(filename) i all_words ] File.foreach(filename) do |line| ws ords.parse(line) next if ws.nil? all_words + s i + if ( i % LOG_FILE_READ_NUM 0 && $VERBOSE ) puts " Read #{i} lines." STDOUT.flush() end end all_words end # Returns the words of a line. # def Words.parse(line) # replace newline line.gsub!(/\r\n/, '') return nil if line.length 0 # seperate all special characters by blanks line.gsub!(/([,;!?:.])/, ' \1 ') # remove unused special characters line.gsub!(/[+ \r\n()"]/, " ") line.gsub!(/\-+/, " - ") line.gsub!(/'[^t]/, "") # split the line into words words ine.split(/[ ]+/).select { |w| w.length > 0 } words end end --------------030408050505030807000305 Content-Type: text/plain; name retty-print.rb" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename retty-print.rb" # class NilClass def pretty_to_s() "nil" end end class Fixnum def pretty_to_s() to_s() end end class Float def pretty_to_s() to_s() end end class String def pretty_to_s() to_s() end end class Array def pretty_to_s() results "["] self.each() do |e| results + e.pretty_to_s(), "," ] end results.pop() if results.length() > 1 results + "]" ] results.join("") end end class Hash def pretty_to_s() results "{"] self.each() do |e| results + e.pretty_to_s(), "," ] end results.pop() if results.length() > 1 results + "}" ] results.join("") end end --------------030408050505030807000305 Content-Type: text/plain; name ain-markov-chains.rb" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename ain-markov-chains.rb" #!/usr/local/bin/ruby -w require 'getoptlong' require 'story-generator' require 'markov-chain' require 'words' # Prints out the synopsis. # def synopsis() puts "ruby main-markov-chains.rb [--help] [--number_of_words N|-n N] [--width W|-w W] [--verbose|-v] [--order O|-O o] textfiles" end # these ugly global variables are used for printing out the progress LOG_FILE_READ_NUM 000 LOG_WORDS_NUM 00 # command line option $VERBOSE il $NUMBER_OF_WORDS 00 $WIDTH 8 $ORDER opts etoptLong.new( ["--help", "-h", GetoptLong::NO_ARGUMENT ], ["--number_of_words", "--num", "-n", GetoptLong::REQUIRED_ARGUMENT ], ["--order", "--ord", "-o", GetoptLong::REQUIRED_ARGUMENT ], ["--width", "-w", GetoptLong::REQUIRED_ARGUMENT ], ["--verbose", "-v", GetoptLong::NO_ARGUMENT ] ) opts.each do |opt, arg| case opt when "--help" synopsis() exit(0) when "--number_of_words" $NUMBER_OF_WORDS nteger(arg) when "--verbose" $VERBOSE rue when "--width" $WIDTH nteger(arg) when "--order" $ORDER nteger(arg) end end files RGV if files.length() 0 STDERR.puts("Error: Missing argument") synopsis() exit(1) end # main # our story generator sg toryGenerator.new(order ORDER) # feed all the files into it files.each do |file| sg.add_file(file) end # calculate the probabilities if ( $VERBOSE ) puts "Calculating probabilities ..." STDOUT.flush() end sg.mc.recalc_all() # generate the story if ( $VERBOSE ) puts "Generating..." STDOUT.flush() end story g.story($NUMBER_OF_WORDS) # and print it out formatted index last space while index < story.length() # remember the last non word element space ndex if ( story[index, 1] /[ ,:;!?.]/ ); if (index - last $WIDTH ) raise "There is a word larger then the width" if ( last space+1 ) puts story[last..space] index pace last pace + 1 end index + end puts story[last..-1] --------------030408050505030807000305 Content-Type: text/x-vcard; charset=utf-8; name oern.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename oern.vcf" begin:vcard fn;quoted-printable:Jörn Dinkla n;quoted-printable:Dinkla;Jörn adr;dom:;;;Hamburg email;internet:joern / dinkla.net tel;cell:+49(0)179 70 10 60 5 x-mozilla-html:TRUE version:2.1 end:vcard --------------030408050505030807000305--