The Sourcery: SymbolTableDiscussion

class Object {
  long long id;                // object id 
  int identity;                // object symbol
  list<Object> parents;        // parents
  map<string,Method*> methods; // methods
  set<Var> vars;               // attributes
  vector<Symbol> symbols;      // symbol table 
  int ref;                     // reference counter
  ...
}

class Method {
  int identity;                // object symbol
  int instruction_len;         // number of instructions in method
  int *inst;                   // method code 
  set<Var> vars;               // local variables 
  int ref;                     // reference counter
}

Someone: If I'm reading Tyche correctly, vnums and symbols solve two different purposes. vnums are there to reference certain categories of monsters, not individual monsters. When you spawn 10 orcs into a room, they all have the same vnum. But with Tyche's system, they have a different symbol.

Nod. Symbols are used by programmers/builders to manipulate objects. Diku does make the hard distinction between prototypical object and actual object. I really don't distinguish between any objects in the server. However the programmer or builder may flag such objects as prototypes.

Same someone: As to why we should look into systems other than the vnum system: A vnum merely exists to categorize things that the database can refer to them by a unique identifier. That is all. They serve no other practical purpose unless your game lacks the facilities to allow you to meaningfully and uniquely identify an object.

Exactly. And if you lack the mechanisms to uniquely identify an object, you end up with a poor interface to building/programming. At least I think .

It may make more sense if one considers it is a proto-OO design.

For example supposing this structure:

$thing
      \ 
       $monster
               \ 
                $orc_______
                    \      \ 
                     $orc_1 $orc_2 ...

All 5 of the objects above have OIDs and are full objects in their own right (there are no classes). Each specializes the interface of their parent object. There is an application level difference in that $orc be used as a factory of orcs by any reset/population code you have. in creating and designing areas, builders not generally be focused on the orcs that are auto_generated ($orc_1, etc.). They be interested more in the prototypes $monster and $orc.

@clone $monster as $troll gives...

$thing
     \ 
       $monster___________________
               \                  \ 
                $orc_______        $troll
                    \      \ 
                     $orc_1 $orc_2 ...

The builder then customizes the $troll to make it a troll and adds it to any reset/population code.

Or perhaps... @set $orc_1 somestat newstats (customize as needed) @rename $orc_1 $urukhai @set $urukhai flags #spawnable @set $troll flags #spawnable gives...

$thing
      \ 
       $monster___________________
               \                  \ 
                $orc_________      $troll
                    \        \           \ 
                     $urukhai $orc_2 ...  $troll_1 $troll_2 ...
                             \ 
                              $urukhai_1 ...

Or maybe I'd just like an easy way to remember a given mob... @rename $troll_1 $tychesfavoritetroll gives...

$thing
      \ 
       $monster___________________
               \                  \ 
                $orc_________      $troll
                    \        \           \ 
                     $urukhai $orc_2 ...  $tychesfavoritetroll $troll_2 ...
                             \ 
                              $urukhai_1 ...

Consider the ease use of symbols in writing code/scripts (even mobprogs - ick)

mob mload $orc
mob remove $sword all
if ischild $orc
  mob say Ew! You're a stinking orc.
endif

as opposed to...

mob mload 12335
mob remove 435 all
if vnum $i == 12335
  mob say Ew! You're a stinking orc.
endif

Or just maybe instead of resets, you're a mad scientist and rather use genetic code and have your mobs physically spawning/mating across the fruited and fertile plain of your mud. you write your own specialized cloning code and symbol generator.

$genusmordor___________________________       
    \               \                  \ 
     $orc            $urukhai           $olaghai
         \                   \                  \ 
          $orc_1 ...          $urikhai_1 ...     $olaghai_1 ...
                \            /          \         /
                 $orc_urikhai_gen_1_AEBDJ $urikhai_olaghai_gen_1_ADBCJ
                                \                         \ 
                                $orc_urikhai_gen_2_AEBAJ  $urikhai_olaghai_gen_2_ADBBJ

Where the symbols generated contain parentage, generational info and some sort of genetic marker sequence.

Eventually Saruman the mad builder upon finding a desireable orcish strain chooses to name it and breed from it...

@rename $urikhai_olaghai_gen_24_ADSFJ $whitehandorc

Oh yeah while I'm have fun posting ascii pictures, I ran across this in my documents repository, which illustrates some other mechanisms I use to generate memorable symbol names.

               accepts
$telnetdaemon --------> $connection
                            \ 
                             \ creates
        make new user         \>
$user <---------------------   $connection_10029383844 ... 
    \                          .   (uses timestamp in symbol)
     \ creates                 .
      \>(uses name in symbol)  .   make character
       $user_tyche ... $user_fred ------------>$character
                               .                /
                                .........      / creates
                                         .   </
                                          ...$char_frodo_baggins $char_samwise_gamgee ...

More on the variation that: All objects are equal (but some objects are more equal than others)

Xeon: Why not just VNUM's? They're much easier to remember. Plus it's faster for the computer to deal with integers than whatever the hell you're trying to do, I assume.

First it goes without saying that parsing 'mload 9273' and '@load $orc' are equivalent. Both are text. The former requires translation into an integer (e.g. atoi()), the latter requires translation into a symbol table, either creating a new entry or using an existing entry.

Now at the risk of further confusion maybe a few words about how symbol tables are designed. Forget about OIDs for the moment. The dirty little secret is that symbols are also stored as numbers internally! It might be helpful to grab a compiler design book and read about the various algorithms used to create, store and retrieve symbols, and the reasoning behind them. I use at least three different algorithms depending on the scope and use of the symbol, whether it's scoped globally, at the object level, or at method level (functional scope). The symbols I've been describing are globally scoped, and used when referencing objects.

The mechanism for global symbols ($orc, $monster, etc) is as follows:

class symbol_entry {
public:
  char *s;   // symbol string name
  int ref;   // reference counter
  int hash;  // symbols hash value
  int next;  // next symbol's subscript in symbol_table with the same hash value
};

symbol_entry *symbol_table;  // the global symbol table (array)
int *symbol_hashtable;       // the hash table for the global symbol table (array)
int free_symbols;            // subscript to first unused symbol

Both the symbol_hashtable and symbol_table arrays are allocated to some configurable size as well as supporting expansion if needed at runtime. When the server boots the symbol_hashtable is initialized to nulls and the symbol_table is initialize with each next entry pointing to the next subscript (i.e. symbol_table[i].next = ++i;). Then the symbols are read in from the database and entered into the tables.

Adding a symbol: The symbol string is hashed and the symbol_hashtable[hash_value] is checked. If it's null we then fill in symbol_table[free_symbols] - set s to the text. - set symbol_hashtable[hash_value] to free_symbols - set free_symbols to next, next to null - set hash to hash_value. - set ref to 1. If it's not null we loop through symbol_table[symbol_hashtable[hash_value]] - using next as the increment until next is null comparing the strings with s - to find a match - If we find a match we increment the ref counter - if not we fill in the free_symbols entry as above and update our subscript pointers.

Deleting a symbol: - checks the reference counter and puts it back on the free_symbols list if unreferenced

Now the subscript into the symbol_table is stored on the Object and used internally:

class Object {
  long long id;                // object id 
  int identity;                // object symbol <-------HERE
  list<Object> parents;        // parents
  map<string,Method*> methods; // methods
  set<Var> vars;               // attributes
  vector<Symbol> symbols;      // symbol table 
  int ref;                     // reference counter
  ...
}

So under the covers symbols are integers and integer math is used internally. They are also unique at runtime, but NOT persistent.

The peculiar use of an hash table array to point to what amounts to be an array subscript-based link list can be found in Sedgewicks "Algorithm's in C: Parts 1-4". There are a number of reasons why I'd chose this structure. One better performance on some traversals as page faults are minimized, and minimizes node allocations. Two having reverse pointers (subscripts) solves the problem of allowing lookup from both the text version of the symbol and the integer version of the symbol. Thirdly the subscript into symbol_table is guaranteed to be unique. And lastly but most importantly...

Consider the problem with @rename $orc $troll. I could have lots of references out there using the symbol $orc. To cycle through all the objects and all their naughty parts updating the symbols is completely unacceptable. However since they all use the same subscript in symbol_table all I have to change is the string there and viola. It's almost too easy.

And back to OIDs (or UIDs)... Why don't I use the symbol integers as the OIDs and why are they transient? Well the rename problem above is reason enough. In renaming an object I do not wish to change it's unique identity. After all it hasn't changed just the symbolic name has. Changing the actual identity of the object (OID) requires database access. Also I have some other reasons that have to do with inheritance features and runtime polymorphism.

Again nothing above illustrates how OIDs are associated with symbols. That involves another piece of code and associated structures and if anyone's interested I can elaborate on that.

Here's a snapshot of how this structure might look in memory with very tiny table sizes and containing only three symbols

symbol_hashtable
[0] = unused entry
[1] = 2               // troll and tyche are assumed to produce hash collision here
[2] = 1               // orc hashes to here
[3] = 0
[4] = 0
[5] = 0
  
symbol_table
      ref hash next  s
[0] = unused entry
[1] = 37   2    0   $orc   
[2] =  1   1    3   $troll
[3] = 14   1    0   $tyche  
[4] =  0   0    5 
[5] =  0   0    6
...

free_symbols = 4

One could also optionally used -1 as a "special value" instead of zero instead of using subscript [0] as special unused placeholder.

Lindahlb: You seem to have left out one important detail. Taking the symbol and translating it into something useful (an object, I assume).

Yes I mentioned leaving that out. I was responding to the assertion that Symbol comparisons be prohibitively expensive, (i.e. string vs. integer).

Lindahlb: However, I assume your class, 'symbol_entry', should contain some data member (such as ObjRef?), that will be returned when a symbol is queried for its meaningful object. If , I assume this be a list of ObjRefs?, considering your symbols can pertain to a multitude of objects (as per your 'ref' data member). And the returned result from a query of a symbol return a list of all objects with such a symbol (perhaps having two queries, one returning a list, the other returning the first object, depending on the symbol's use).

Am I correct in my assumptions? Or is this symbol table being used for something else?

Well the ref field is used for reference counting for Symbol usage and doesn't refer to an Object. It's necessary for proper deletion semantics. The VM acquires symbols and can't get rid of a Symbol until it's completely finished with it.

The missing piece of the relationship between symbols and OIDs is the province of the ObjectManager?. I operate on the design premise that Objects don't have to exist in memory. It's the job of the ObjectManager? to load, unload and otherwise manage Objects as needed. It provides transactional consistency and is the persistent store. Within the ObjectManager? are several caches. One of those is the NameCache?. I've mentioned there is a separate database for symbols; it contains only the following key/value pairs, the OID and the symbol text. That is what is used to populate the symbol_table at boot up. The NameCache? front-ends that database.

class NameCacheEntry {
  long long id;      // object id 
  int identity;      // symbol 
  bool dirty;        // cache write flag 
  int  usecount      // cache hit counter
};

Symbol resolution to OID is done at the very last moment possible. In many cases there's no need to associate a Symbol with an OID, as only the existence needs to be validated rather than the actual object accessed. Only when the object is actually in memory and must be accessed does the actual association need to be made. During the course of using and modifying the symbol_table messages are sent to the ObjectManager? when names are modified. The ObjectManager? runs asynchronously in it's own thread pool. this decoupling of tables is done to maximize performance by minimizing data locking. In fact it is done in a way that neither thread (VirtualMachine? or ObjectManager?) has to ever obtain a lock on the others table. :-!

Now if one is operating under a different design premise, either a single-threaded design and/or one that demands that all Objects must exist in memory, it probably make sense merge these structures together within the symbol_table itself.