18 Kasım 2012 Pazar

Digraph Code - II

continues from digraph code - i ...

Also you can access the source code on GitHub

Implementation


We have a directed graph class named digraph. It holds the nodes of the graph and provides necessary functions to add/remove nodes/arcs. It also provides functions to solve some graph theory related problems. (currently only shortest path problem on non-weighted graphs). Inside it is dgNode class which holds the nodes. If you want to use digraph class as a container your data is held inside these dgNode objects. Finally we have links. Every node is responsible for keeping the arcs it sends to other nodes but they also hold a list of incoming arcs for performance reasons.

Let's go from bottom to up. The arcLink class. It is used to store arcs in a linked list member. It can't work properly without dgNode class because top of the link is held there (which makes dgNode the linked list class actually). I guess I should have made all of it public to dgNode class and carried all of its functions there.

Right now, you can do everything that meddles with the middle of the list by using arcList objects but you have to do it from dgNode class if you need to do something concerning the top/beginning list member. I guess it works fine right now but later I'll make all arcList members public to dgNode and merge all the functions in arcLink with those in dgNode. Having two functions, one for normal members and one for top member is absurd. Also new arcs are inserted at second place (instead of top which would imply stack-like behaviour) and that way we can't trace the sequence of arcs properly.

An arcLink object know its place in the list. I thought that would be unnecessary, I even have a function getArc(int) which returns the arc in a specified index commented out. However it has one great advantage, you can tell how many links there are just by looking at the index of last element. (if you call getLastIndex() from an intermediate element it traverses to last element and returns its index. Constructors increment all following indices by one and destructors decrement by one.

Linked list elements are removed by removeArc() function in dgNode. Note that it is possible to have more than one arc from one node to another. removeArc() removes only one arc at a time. There is also a removeArcW() function to remove an arc with a specified weight. It is useless unless you have multiple arcs between nodes.

dgNode class is the class that holds digraph together. It does everything by using pointers. The whole digraph is connected together with pointers in fact. However it also holds an index value, which is useful in its parent class digraph

It has two functions which are kinda weak/dangerous. getArcToList() takes an array pointer (which is a dgNode pointer array) as argument and fills the array with the addresses of the nodes the nodes sends arcs to. If the array size is not set to lastArcToOther->getIndex() then you have a problem. The other one seems safer, getArcFromList() takes a vector pointer as argument and similarly fills the vector with node addresses. Size is not a concern here. For both functions you need to preferably create an array/vector dynamically and delete it when you are done. 

The digraph class itself does not concern itself with all the pointer stuff underneath. It assigns an index to each node when it is created and keeps the node's address in the vector indexList (at position node index). Once a node is deleted it's address is replaced by a zero (so if anyone tries to access it we can tell them node is deleted). Indices of deleted nodes are not re-assigned.

Nodes themselves are held in a linked list namely nodeList. It uses the default STL allocator. It places new nodes in place of deleted nodes so prevents memory fragmentation.

The only useful function solveShortestPath(int, int) we have at the moment finds the shortest path from one node to another in an un-weighted graph. Implementing weighted version should not be so hard (which I'll do sometime). It uses Dijkstra's method, you can read it on Wikipedia. I think comments in source are clear if you understood the algorithm in theory.

Problems I Encountered


I saved the best part for the last. Below are the three main problems which cost me so much precious time. But I learned some important stuff on OOP design in return I guess.

I used nested classes for the first time. That was a main cause of problems because, as it turns out, I wasn't accustomed to designing and using classes which hide their implementation details. Not being able to access private members was no problem since I would write all the classes public and global. I would take a copy or an instance of the object which contained the private member I had to use and access that private member as if it was public. You can't do that with private classes. I guess I took the whole encapsulation concept wrong before. You see, when you hide some members inside a class you do it because you don't want them to be accessed/changed without the supervision of the class.

Same thing goes for sub-classes. You make a class private so that that class is not accessed from outside world without the supervision of the parent class. If sub-classes are public and you use private object members of that class in the parent class; well, that doesn't make them private. In a big class hierarchy you can access all elements (including those supposed to be private) from the top class.

The right way to do it is, however, making sub-classes private (as opposed to having instances of those classes as private members). Private members of sub-class are shown to the parent class with public methods and outsiders or higher hierarchy classes (eg. grandparent class) can access them with public methods the parent class provides. It seems complex when I write it like this but in fact desinging stuff that way simplifies things. You just need to add some access functions on every level should you need to access inner private members.

One other interesting thing I encountered was the typename keyword. I used it when I wrote the map template for the stack maze code I mentioned above. I though it was used to define a variable which would contain the name of a class or a type which was interpreted at compile time. It seems it has another function. The compiler sometimes doesn't understand when you use nested names for classes (or more likely templates) like: myForest::myRBTree::myTreeNode *pNode. And as far as I understand the compiler/parser doesn't even need to understand (by standard) that we are defining a pointer to a node class which is nested deep in some other classes. When that happens we add the keyword typename to let the parser know we are defining a type in the next token. Well can't say I fully understand when parser fails to distinguish the typenames but I add the keyword typename before the nested type definition when I get a relevant compiler error.

A third thing which gave me pain was (I mean which was educative :) implementing my own linked list classes. I wanted to make my digraph template/container light-weigth so I decided not to use STL's list implementation. I was going to use the linked list class to hold arcs to other nodes. My error was, I did not write a full linked list class implementation. I only wrote a linked list member class (class arcLink in the code) and I stationed the list's initial nodes in its parent class (class dgNode). What follows is, linked list members can delete intermediate members themselves using member functions called from any (prior) object of the arcLink class. However when you need to delete the initial/beginning element you have to do it from the parent class, since you can't change the value of a pointer to an object inside a member function of that object. (well, I couldn't describe it I guess but I can write a short post about that later.) Shortly, my linked list implementation is divided between dgNode and arcLink classes. I should have made an arcList class which included arcLink class as public. (or maybe made arcLink's members public, so that dgNode would be the linked list class)

So the lesson I learned was; If you want to implement some basic data structure, either use some generic class/template or write a full class which can work on its own (without any info or commands coming from parent class). I did what I did just for the sake of saving some memory but the result was lots of lost time and not-so-easy-to-read bits of code laying all around the dgNode class.

Hiç yorum yok:

Yorum Gönder