18 Kasım 2012 Pazar

Digraph Code - I

Hi, welcome back for an English blog entry. I will be writing about stuff I coded in last moths and I think some of them will be useful for anyone interested in learning about the Linux kernel, Linux system programming or C++/STL.

Well, I wrote some code which finds the exit of a 2d maze by using a stack, back when I saw a problem in the Larry Nyhoff's book. One can call it some sort of "depth-first search". Then I enhanced it to find the shortest path possible. You can read its code on GitHub. It's very crude and may be inefficient but I was just beginning to use C++ back then.

So I later found out (in the same book) that shortest paths can be found much more easily by using breadth-first search on a directed graph (namely Dijkstra's method). And I set off to write my own digraph class right away. Well, it took more time than I anticipated but more on that below.

Finally I want to say writing a digraph class/container helped me see how the design process is important. Also I realized I had the whole encapsulation principle wrong in OOP. It took time but it was well worth it. Unfortunately some parts are a little cryptic but it works alright.

You can get the source from GitHub.

Design


I thought I would hold the nodes(vertices) of the graph in a container class (a vector at first) and arcs(edges) would just be pointers to the target node which are stored in source node .

I didn't want to use a matrix representation since that would hold up too much space for large graphs which have a little number of arcs/edges per node. (I was planning to use the digraph class to solve shortest path problem on the 2d discrete orthogonal map I mentioned above. A digraph implementation would contain only a struct coord{int x,int y} and at most 4 arcs per node.)

I decided to preserve the arcs from one node to another in a linked list in the source node (arcLink *firstArcToOther in code). But I realised if I hold the arcs only in the starting node, removing a node became very expensive. You see, to remove a node you have to remove all the arcs coming to it from all other nodes as well. And if the node doesn't know who sends arcs to it, you have to visit all other nodes and search all of their linked lists, looking for arcs to the node you want to remove. That means treading on almost any pointer in the whole data structure. So I decided to hold pointers to the nodes starting an arc to the node in a seperate linked list in the node (some stackish linked list starting with arcLink *topArcFromOther in code).
I also thought accessing the number of arcs a node starts to other nodes would be frequently needed. So I added a tweak, you can access the last element of the linked list (arcLink *lastArcToOther in code) and read its index to see how many arcs that node starts to other nodes. You don't need to traverse the whole list of nodes. (If you want to see how many arcs coming from other nodes you have to traverse the list beginning with topArcFromOther though.)

One final note about arcs; having more than one arc from one fixed node to another fixed node is possible but you can't know which one of the arcs will be removed when you remove one arc. You need to use removeArcW if you want to remove an arc with a specific weight. It should not be a problem if all arcs are identical though.

Nodes are the core of the whole structure. They hold the data (of type elementType in template) and linked lists of arcs from/to the node. You also use a node object to start an arc to another node. Same goes for removing an arc. Nodes also enjoy a member function which removes all arcs coming from other nodes by accessing other nodes' arc lists. One note about nodes I'll explain in implementation details is they also contain heads of the arc lists. They are part of the linked list structure. (which was a design mistake I made.)

The nodes are contained in a linked list (STL's list container). I first used a vector to store the nodes but when you remove a node all the consequent nodes in the vector are moved and all of their pointers change. Not surprisingly that breaks the whole arc mechanism. Node addresses should not change no matter what. But we also have a vector to match node addresses with node indices which allows random access.

Using a linked list have its fallbacks too. If all the new nodes are added to new (higher) memory locations that causes memory fragmentation if you add and remove new nodes frequently. A good memory allocation scheme would be to place new nodes in place of removed nodes. And STL's default list allocator seems to do just that, it adds a new node to the last place a node is deleted. So I didn't wrote an allocator.

continues on Digraph Code - II ...

Hiç yorum yok:

Yorum Gönder