On 5/10/05, Nuralanur / aol.com <Nuralanur / aol.com> wrote: > now you have confused me almost totally. Sorry about that. I have a tendency to do that. :) I'll think of examples that in my mind are clear but introduce incidental complexities that don't have anything to do with the problem at hand and end up muddying the water. > Actually, I was first responding to the subject because I had sowhat vaguely > in mind that it might help to formulate the problem in as a shortest-path > problem (define a cost function and try to minimise it. For example, in the > song problem, pay me $1000000 each time the subsequent song doesn't > start with the letter the previous ended with. Otherwise, pay nothing. If you > have to pay nothing after going through the whole path, you have found a > correct solution and I am still poor. > > For this sort of problem, there is a way to find the "single-source shortest > path solution" called Dijkstra's algorithm, which is nicely described here: > > _http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dijkstra.html_ > (http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/dijkstra.html) Yes, I'm familiar with Dijkstra's algorithm and the connection to incidence matrices. That's why in my first post I said "Though an incidence [matrix] can work (and work quite well) for a simple graph..." (mistype of 'graph' for 'matrix' in the original post corrected). I'll reexplain the case for multiple edges per node pair below and why Djikstra's algorithm (in its original form) is not applicable. > My confusion is about your use of multiple criteria/connections/... . Ok, and this is where I muddied the water. My apologies on that. Ignore the multiple criteria, and let's deal with an example that only has one weight per edge. Let N be the set of letters/characters with which a song may begin or end, and E a set of (directed) edges representing songs, where each edge connects the node representing the first character of the song to the node representing the last character of the song. So, for instance, the edge "Fingertips"[1] would connect the nodes [F] -> [S]. Now we can represent a graph G = { N, E }. Let also each edge be weighted by the length of the song. The goal is to determine the quickest (not necessarily shortest in number of songs, but shortest in running time) song chain to get from a starting letter to another letter B. This is a simple matter of finding the cheapest path in our graph between the nodes [starting letter] => [ending letter]. Djikstra's algorithm could be useful here, except (as you say): > In the original version of Dijsktra's algorithm, there is only one connection > from each edge to each other. Elaboration: Djikstra's algorithm uses an incidence matrix in which each entry in the matrix represents the weight of the edge between the indices. Obviously then, the matrix can only represent one edge per index pair, where the indices are nodes. So Djikstra's algorithm only allows one edge per node pair. But in our example multiple edges can connect the same pair of nodes. For example, we connected [F] -> [S] by "Fingertips" above. But the song "Fake Out In Buenos Aires"[2] also connects [F] -> [S]. So you can't just expect one edge, and Djikstra's algorithm is no longer available (in it's original form... there may be an adaptation for this, but I've never seen one -- I would love to be wrong). Ok, so we've seen an example where there are multiple edges per node pair, and we've seen why Djikstra's algorithm wouldn't work in that model. But Djikstra's algorithm is cool and /fast/! So why not try a different representation of the model the *would* allow Djikstra's algorithm? Instead of having songs as edges, lets try songs as nodes (as you mentioned). What should we represent in edges? A logical choice would be to have a directed edge from song A to song B iff song B can follow song A according to our rules. So there'd be an edge from "Fingertips" to "Sapphire Bullets of Pure Love"[3] (s to S), but no edge from "Sapphire Bullets of Pure Love" to "I Palindrome I"[4] (e to I). That builds a working graph, but there are problems: (1) Where do we start? Where do we end? Maybe we can use an extra node that represents the start state and add edges to all songs starting with the desired start letter. Same for the goal state -- we add a node and connect from all songs ending with the desired letter to it. The problem is that we can't reuse the same graph for multiple runs -- if we change out start/goal conditions, we need to change the graph. (2) How do we weight the edges? For our example the weight appropriately belongs to the song, which is now a node instead of an edge. We can weight all edges coming out of a song node with the length of the song. This together with the solution to (1) makes for a viable graph. But I contest that the shift of weight from the appropriate entity (song) to multiple edges is a duplication and obfuscation of the model. (3) The revised model is much, much larger. Assume a database of N songs beginning or ending with one of M letters/characters. The actual distribution of connections between songs is probably much more complex, but let's just assume each character is equally likely. So N/M songs end with a given character and N/M songs will start with the same given character. So we'll need (N/M)^2 edges each for M characters, or a total of (N^2)/M edges. So our graph has N nodes and (N^2)/M edges. Compare that to M nodes and N edges (one edge per song). Add to that the fact that N is going to be orders of magnitude greater than M and you'll notice the difference. Simply, I just wanted to point out that there are situations in which it makes sense to have a graph with multiple edges per node pair. In those situations, the original version of Djikstra's algorithm using incidence matrices is not applicable. Jacob Fugal [1] http://www.tmbw.net/wiki/index.php/Fingertips [2] http://www.tmbw.net/wiki/index.php/Fake_Out_In_Buenos_Aires [3] http://www.tmbw.net/wiki/index.php/Sapphire_Bullets_Of_Pure_Love [4] http://www.tmbw.net/wiki/index.php/I_Palindrome_I