Scroll to navigation

LIBCDT(3) Library Functions Manual LIBCDT(3)

NAME

Cdt - container data types

SYNOPSIS

#include <cdt.h>

DICTIONARY TYPES

Void_t*;
Dt_t;
Dtdisc_t;
Dtmethod_t;
Dtlink_t;
Dtstat_t;

DICTIONARY CONTROL

Dt_t*       dtopen(const Dtdisc_t* disc, const Dtmethod_t* meth);
int         dtclose(Dt_t* dt);
void        dtclear(dt);
Dtdisc_t*   dtdisc(Dt_t* dt, const Dtdisc_t* disc, int type);
Dtmethod_t* dtmethod(Dt_t* dt, const Dtmethod_t* meth);
Dt_t*       dtview(Dt_t* dt, Dt_t* view);
int         dtcustomize(Dt_t* dt, int type, Void_t* arg);
int         dtoptimize(Dt_t* dt);
int         dtshare(Dt_t* dt, int type);
int         dtlock(Dt_t* dt, unsigned int key, int type);

STORAGE METHODS

Dtmethod_t* Dtset;
Dtmethod_t* Dtbag;
Dtmethod_t* Dtrhset;
Dtmethod_t* Dtrhbag;
Dtmethod_t* Dtoset;
Dtmethod_t* Dtobag;
Dtmethod_t* Dtlist;
Dtmethod_t* Dtstack;
Dtmethod_t* Dtqueue;
Dtmethod_t* Dtdeque;

DISCIPLINE

#define DTOFFSET(struct_s,member)
#define DTDISC(disc,key,size,link,makef,freef,comparf,hashf,memoryf,eventf)
typedef Void_t*      (*Dtmake_f)(Dt_t*, Void_t*, Dtdisc_t*);
typedef void         (*Dtfree_f)(Dt_t*, Void_t*, Dtdisc_t*);
typedef int          (*Dtcompar_f)(Dt_t*, Void_t*, Void_t*, Dtdisc_t*);
typedef unsigned int (*Dthash_f)(Dt_t*, Void_t*, Dtdisc_t*);
typedef Void_t*      (*Dtmemory_f)(Dt_t*, Void_t*, size_t, Dtdisc_t*);
typedef int          (*Dtevent_f)(Dt_t*, int, Void_t*, Dtdisc_t*);

OBJECT OPERATIONS

Void_t*   dtinsert(Dt_t* dt, Void_t* obj);
Void_t*   dtappend(Dt_t* dt, Void_t* obj);
Void_t*   dtdelete(Dt_t* dt, Void_t* obj);
Void_t*   dtattach(Dt_t* dt, Void_t* obj);
Void_t*   dtdetach(Dt_t* dt, Void_t* obj);
Void_t*   dtsearch(Dt_t* dt, Void_t* obj);
Void_t*   dtmatch(Dt_t* dt, Void_t* key);
Void_t*   dtfirst(Dt_t* dt);
Void_t*   dtnext(Dt_t* dt, Void_t* obj);
Void_t*   dtlast(Dt_t* dt);
Void_t*   dtprev(Dt_t* dt, Void_t* obj);
Void_t*   dtleast(Dt_t* dt, Void_t* obj);
Void_t*   dtmost(Dt_t* dt, Void_t* obj);
int       dtwalk(Dt_t* dt, int (*userf)(Dt_t*, Void_t*, Void_t*), Void_t*);
Dtlink_t* dtflatten(Dt_t* dt);
Dtlink_t* dtlink(Dt_t* dt, Dtlink_t* link);
Void_t*   dtobj(Dt_t* dt, Dtlink_t* link);
Dtlink_t* dtextract(Dt_t* dt);
Dtlink_t* dtrestore(Dt_t* dt, Dtlink_t* link);

DICTIONARY STATUS

Dt_t*     dtvnext(Dt_t* dt);
ssize_t   dtvcount(Dt_t* dt);
Dt_t*     dtvhere(Dt_t* dt);
ssize_t   dtsize(Dt_t* dt);
ssize_t   dtstat(Dt_t* dt, Dtstat_t* st);

HASH FUNCTIONS

unsigned int dtstrhash(unsigned int h, char* str, int n);
unsigned int dtcharhash(unsigned int h, unsigned char c);

DESCRIPTION

Cdt manages run-time dictionaries using standard container data types: unordered set/multiset, ordered set/multiset, list, stack, and queue.

DICTIONARY TYPES

Void_t*

This type is used to pass objects between Cdt and application code. Void_t is defined as void for ANSI-C and C++ and char for older C compilation environments.

Dt_t

This is the type of a dictionary handle.

Dtdisc_t

This defines the type of a discipline structure which define the lay-out of an object and functions to compare, hash, make, delete objects, etc. (see dtdisc()).

Dtmethod_t

This defines the type of a container method.

This is the type of a dictionary object holder (see dtdisc().)

Dtstat_t

This is the type of a structure to return dictionary statistics (see dtstat().)

DICTIONARY CONTROL

Dt_t* dtopen(const Dtdisc_t* disc, const Dtmethod_t* meth)

This creates a new dictionary. disc is a discipline structure to describe object format. meth specifies a manipulation method. dtopen() returns the new dictionary or NULL on error. See also the events DT_OPEN and DT_ENDOPEN below.

int dtclose(Dt_t* dt)

This deletes dt and its objects. Note that dtclose() fails if dt is being viewed by some other dictionaries (see dtview()). dtclose() returns 0 on success and -1 on error. See also the events DT_CLOSE and DT_ENDCLOSE below.

void dtclear(Dt_t* dt)

This deletes all objects in dt without closing dt.

Dtdisc_t* dtdisc(Dt_t* dt, const Dtdisc_t* disc, int type)

If disc is NULL, dtdisc() returns the current discipline. Otherwise, it changes the discipline of dt to disc. Objects may be rehashed, reordered, or removed as appropriate. type can be any bit combination of DT_SAMECMP and DT_SAMEHASH. DT_SAMECMP means that objects will compare exactly the same as before thus obviating the need for reordering or removing new duplicates. DT_SAMEHASH means that hash values of objects remain the same thus obviating the need to rehash. dtdisc() returns the previous discipline on success and NULL on error.

Dtmethod_t dtmethod(Dt_t* dt, const Dtmethod_t* meth)

If meth is NULL, dtmethod() returns the current method. Otherwise, it changes the storage method of dt to meth. Objects may be rehashed, reordered, or removed as appropriate. dtmethod() returns the previous method or NULL on error.

Dt_t* dtview(Dt_t* dt, Dt_t* view)

A viewpath allows a search or walk starting from a dictionary to continue to another. dtview() first terminates any current view from dt to another dictionary. Then, if view is NULL, dtview returns the terminated view dictionary. If view is not NULL, a viewpath from dt to view is established. dtview() returns dt on success and NULL on error.

It is an error to have dictionaries on a viewpath with different storage methods. In addition, dictionaries on the same view path should treat objects in a consistent manner with respect to comparison or hashing. If not, undefined behaviors may result.

int dtcustomize(Dt_t* dt, int type, Void_t* arg)

This customizes a storage method. The type argument indicates the type of customization and arg gives additional information for the operation. Here are the types:

This turns on/off the share mode for a dictionary. Concurrent accesses of a dictionary not in share mode may exhibit undefined behaviors including memory segmentation.

Share mode allows multiple accessors, threads or processes, to access objects. Such objects could be in the same directory in the case of threads or shared memory in the case of processes.

This causes the underlying method to optimize its internal data structure. For example, the splay tree underlying Dtoset would be balanced.

int dtoptimize(Dt_t* dt)

This is a short-hand for invoking dtcustomize() with the DT_OPTIMIZE event.

int dtshare(Dt_t* dt, int type)

This turns on or off share mode for dictionary dt depending on whether type is positive or non-positive. It returns -1 on failure.

int dtlock(Dt_t* dt, unsigned int key, int type)

This globally locks/unlocks a dictionary using the given key. It returns 0 on success and -1 on failure. The value of key must not be 0. The argument type is used as follows:

Unlock the dictionary if it was locked with key. An error will result if the dictionary was locked with a different key.
Attempt to lock the dictionary with key if it is unlocked. An error will result if the dictionary was already locked with a different key.
Attempt to lock the dictionary with key. If the dictionary is already locked with a different key, the call will loop and wait until the lock is open to lock it.

STORAGE METHODS

Storage methods are of type Dtmethod_t*. Cdt supports the following methods:

Dtoset

Dtobag

Objects are ordered by comparisons. Dtoset keeps unique objects. Dtobag allows repeatable objects.

Dtset

Dtbag

Objects are unordered. Dtset keeps unique objects. Dtbag allows repeatable objects. The underlying data structure is a hash table with chaining to handle collisions.

Dtrhset

Dtrhbag

These methods are like Dtset and Dtbag but are based on a recursive hashing data structure that allows table extension without object relocation. The data structure also supports lock-free concurrent search operations for share dictionaries.

Dtlist

Objects are kept in a list. A current object is always defined to be either the head of the list or an object resulting from a recent search or insert operation. The call dtinsert() will insert a new object in front of such a current object while the call dtappend() will append in back of it.

Dtdeque

Objects are kept in a deque. This is similar to Dtlist except that objects are always inserted at the front and appended at the tail of the list.

Dtstack

Objects are kept in a stack, i.e., in reverse order of insertion. Thus, the last object inserted is at stack top and will be the first to be deleted.

Dtqueue

Objects are kept in a queue, i.e., in order of insertion. Thus, the first object inserted is at queue head and will be the first to be deleted.

DISCIPLINE

Object format and associated management functions are defined in the type Dtdisc_t:


typedef struct
{ ssize_t key, size;
ssize_t link;
Dtmake_f makef;
Dtfree_f freef;
Dtcompar_f comparf;
Dthash_f hashf;
Dtmemory_f memoryf;
Dtevent_f eventf;
} Dtdisc_t;

ssize_t key, size

Each object obj is identified by a key used for object comparison or hashing. key should be non-negative and defines an offset into obj. If size is negative, the key is a null-terminated string with starting address *(Void_t**)((char*)obj+key). If size is zero, the key is a null-terminated string with starting address (Void_t*)((char*)obj+key). Finally, if size is positive, the key is a byte array of length size starting at (Void_t*)((char*)obj+key).

Let obj be an object to be inserted into dt. If link is negative, an object holder of type Dtlink_t will be allocated to hold obj. Otherwise, obj should have a Dtlink_t structure embedded link bytes into it, i.e., at address (Dtlink_t*)((char*)obj+link).

Void_t* (*makef)(Dt_t* dt, Void_t* obj, Dtdisc_t* disc)

If makef is not NULL, dtinsert(dt,obj) or dtappend() will call it to make a copy of obj suitable for insertion into dt. If makef is NULL, obj itself will be inserted into dt.

void (*freef)(Dt_t* dt, Void_t* obj, Dtdisc_t* disc)

If not NULL, freef is used to destroy data associated with obj.

int (*comparf)(Dt_t* dt, Void_t* key1, Void_t* key2, Dtdisc_t* disc)

If not NULL, comparf is used to compare two keys. Its return value should be <0, =0, or >0 to indicate whether key1 is smaller, equal to, or larger than key2. All three values are significant for method Dtoset and Dtobag. For other methods, a zero value indicates equality and a non-zero value indicates inequality. If (*comparf)() is NULL, an internal function is used to compare the keys as defined by the Dtdisc_t.size field.

unsigned int (*hashf)(Dt_t* dt, Void_t* key, Dtdisc_t* disc)

If not NULL, hashf is used to compute the hash value of key. It is required that keys compared equal will also have same hash values. If hashf is NULL, an internal function is used to hash the key as defined by the Dtdisc_t.size field.

Void_t* (*memoryf)(Dt_t* dt, Void_t* addr, size_t size, Dtdisc_t* disc)

If not NULL, memoryf is used to allocate and free memory. When addr is NULL, a memory segment of size size is requested. If addr is not NULL and size is zero, addr is to be freed. If addr is not NULL and size is positive, addr is to be resized to the given size. If memoryf is NULL, malloc(3) is used.

int (*eventf)(Dt_t* dt, int type, Void_t* data, Dtdisc_t* disc)

If not NULL, eventf announces various events. Each event may have particular handling of the return values from eventf. But a negative return value typically means failure. Following are the events:

This event is raised at the start of the process to open a new dictionary. The argument data will be a pointer to an object of type Void_t* initialized to NULL before the call. The return value of eventf() is significant as follows:

On a negative return value, dtopen() will return failure.

On a zero return value, eventf() may set *(Void_t**)data to some non-NULL value to indicate that the dictionary structure itself should be allocated along with the Dtdisc_t.data section. Otherwise, it will be allocated separately with malloc(3).

On a positive return value, the dictionary is being reconstructed based on existing states of some previous dictionary. In this case, eventf() should set *(Void_t**)data to point to the field Dt_t.data of the corresponding previous dictionary (see DT_CLOSE below). If the handle of the previous dictionary was created as discussed above in the case of the zero return value, it will be exactly restored. Otherwise, a new handle will be allocated with malloc(). The ability to create different dictionaries sharing the same set of objects allows for managing objects in shared and/or persistent memory.

This event is raised at the end of the process to open a dictionary. The return value of eventf() will be ignored.
This event is raised at the start of the process to close dictionary dt. The return value of eventf is significant as follows:

On a negative return value, dtclose() will return failure.

On a zero return value, all dictionary objects will be deleted and and all associated memory will be freed.

On a positive return value, allocated objects and memory will be kept intact. This means that dt->data remains intact and can be reused in some future dictionary (see DT_OPEN above). Note, however, that dt itself would still be freed if it was allocated with malloc(3).

This event is raised at the end of the process to close a dictionary. The return value of eventf() will be ignored.
The discipline of dt is being changed to a new one given in (Dtdisc_t*)data.
The method of dt is being changed to a new one given in (Dtmethod_t*)data.
This event is applicable to the methods Dtset, Dtbag, Dtrhset and Dtrhbag. It is typically issued when the respective internal data structure of a method is about to be initialized. If the return value of the event handling function is positive, *(ssize_t*)data is examined for further action; else, it is ignored. A positive return value means that the event function wishes to suggest a table size. It does that by setting *(ssize_t*)data to the desired size. Then, the actual table size will be the maximum of the absolute value of *(ssize_t*)data and some predefined value set by the method. In addition, if *(ssize_t*)data was negative, the Dtset and Dtbag methods will never resize the hash table.
This event announces an error that occurred during some operations. The argument (char*)data is a null-terminated string describing the error.

#define DTOFFSET(struct_s,member)

This macro function computes the offset of member from the start of structure struct_s. It is useful for getting the offset of a Dtlink_t embedded inside an object.

#define DTDISC(disc,key,size,link,makef,freef,comparf,hashf,memoryf,eventf)

This macro function initializes the discipline pointed to by disc with the given values.

OBJECT OPERATIONS

Void_t* dtinsert(Dt_t* dt, Void_t* obj)

Void_t* dtappend(Dt_t* dt, Void_t* obj)

These functions add an object prototyped by obj into dt. dtinsert() and dtappend() perform the same function for all methods except for Dtlist (see Dtlist for details). If there is an existing object in dt matching obj and the storage method is Dtset, Dtrhset or Dtoset, dtinsert() and dtappend() will simply return the matching object. Otherwise, a new object is inserted according to the method in use. See Dtdisc_t.makef for object construction. The new object or a matching object as noted will be returned on success while NULL is returned on error.

Void_t* dtdelete(Dt_t* dt, Void_t* obj)

If obj is NULL, methods Dtstack and Dtqueue delete respectively stack top or queue head while other methods do nothing. If obj is not NULL, an object matching obj is deleted. See Dtdisc_t.freef for object destruction. dtdelete() returns the deleted object (even if it was deallocated) or NULL on error.

Void_t* dtattach(Dt_t* dt, Void_t* obj)

This function is similar to dtinsert() but obj itself will be inserted into dt even if a discipline function makef is defined.

Void_t* dtdetach(Dt_t* dt, Void_t* obj)

This function is similar to dtdelete() but the object to be deleted from dt will not be freed (via the discipline freef function).

Void_t* dtsearch(Dt_t* dt, Void_t* obj)

Void_t* dtmatch(Dt_t* dt, Void_t* key)

These functions find an object matching obj or key either from dt or from some dictionary accessible from dt via a viewpath (see dtview().) dtsearch() and dtmatch() return the matching object or NULL on failure.

Void_t* dtfirst(Dt_t* dt)

Void_t* dtnext(Dt_t* dt, Void_t* obj)

dtfirst() returns the first object in dt. dtnext() returns the object that follows an object matching obj. Objects are ordered based on the storage method in use. For Dtoset and Dtobag, objects are ordered by object comparisons. For Dtstack, objects are ordered in reverse order of insertion. For Dtqueue, objects are ordered in order of insertion. For Dtlist, objects are ordered by list position. For Dtset, Dtbag, Dtrhset and Dtrhbag, objects are ordered by some internal order defined at the time when these functions are called.

Objects in a dictionary or a viewpath can be walked using a for(;;) loop as below.


for(obj = dtfirst(dt); obj; obj = dtnext(dt,obj))

Void_t* dtlast(Dt_t* dt)

Void_t* dtprev(Dt_t* dt, Void_t* obj)

dtlast() and dtprev() are like dtfirst() and dtnext() but work in reverse order. For Dtset, Dtbag, Dtrhset and Dtrhbag, both reverse and forward orders are the same. Note that dictionaries on a viewpath are still walked in the order of the viewpath.

Void_t* dtleast(Dt_t* dt, Void_t* obj)

Void_t* dtmost(Dt_t* dt, Void_t* obj)

dtleast() returns the smallest object greater or equal to obj. dtmost() returns the largest object smaller or equal to obj. Again, object ordering depends on the storage method in use. For example, with Dtoset and Dtobag, the ordering of objects is well-defined and it is possible to call dtleast() or dtmost() on an object not in the dictionary and still get a meaningful result. On the other hand, with Dtset or Dtrhset, such a call will essentially be the same as dtsearch() because without matching an object, it cannot be determined what comes before or after.

dtwalk(Dt_t* dt, int (*userf)(Dt_t*, Void_t*, Void_t*), Void_t* data)

This function calls (*userf)(walk,obj,data) on each object in dt and other dictionaries viewable from it. walk is the dictionary containing obj. If userf() returns a <0 value, dtwalk() terminates and returns the same value. dtwalk() returns 0 on completion.

Using dtfirst()/dtnext() or dtlast()/dtprev() to walk a single dictionary can incur significant cost due to function calls. For efficient walking of a single directory (i.e., no viewpathing), dtflatten() and dtlink() can be used. Objects in dt are made into a linked list and walked as follows:


for(link = dtflatten(dt); link; link = dtlink(dt,link) )

Note that dtflatten() returns a list of type Dtlink_t*, not Void_t*. That is, it returns a dictionary holder pointer, not a user object pointer (although both are the same if the discipline field link is zero.) The macro function dtlink() returns the dictionary holder object following link. The macro function dtobj(dt,link) returns the user object associated with link, Beware that the flattened object list is unflattened on any dictionary operations other than dtlink().

dtextract() extracts the list of objects from dt and makes it appear empty. dtrestore() repopulates dt with a list of objects previously obtained via dtextract(). It is important that the same discipline and method are in use at both extraction and restoration. Otherwise, undefined behaviors may result. These functions return NULL on error.

DICTIONARY INFORMATION

Dt_t* dtvnext(Dt_t* dt)

This returns the dictionary that dt is viewing, if any.

ssize_t dtvcount(Dt_t* dt)

This returns the number of dictionaries that view dt.

Dt_t* dtvhere(Dt_t* dt)

This returns the dictionary v viewable from dt where an object was found from the most recent search or walk operation.

ssize_t dtsize(Dt_t* dt)

This function returns the number of objects stored in dt.

ssize_t dtstat(Dt_t *dt, Dtstat_t* st)

This function reports dictionary statistics. It returns the number of objects stored in dt.

Dtstat_t contains the below fields:

This returns the method used for the dictionary, e.g., DT_SET, DT_OSET, etc.
This has the number of objects in the dictionary.
This returns the maximum number of levels in the data structure used for object storage, i.e., the binary tree or the recursive hash table. For a hash table with chaining (i.e., Dtset and Dtbag), it gives the length of the longest chain.
This gives the object counts at each level. For a hash table with chaining (i.e., Dtset and Dtbag), a level is defined as objects at that position in their chains. Since chains can be arbitrarily long, the report is limited to objects at a level less than DT_MAXSIZE.
For a hash table using a trie structure, this counts the number of sub-tables at each level. For example, tsize[0] should be 1 only for this hash table type.

HASH FUNCTIONS

unsigned int dtcharhash(unsigned int h, char c)

unsigned int dtstrhash(unsigned int h, char* str, int n)

These functions compute hash values from bytes or strings. dtcharhash() computes a new hash value from byte c and seed value h. dtstrhash() computes a new hash value from string str and seed value h. If n is positive, str is a byte array of length n; otherwise, str is a null-terminated string.

IMPLEMENTATION NOTES

Dtlist, Dtstack and Dtqueue are based on doubly linked list. Dtoset and Dtobag are based on top-down splay trees. Dtset and Dtbag are based on hash tables with move-to-front collision chains. Dtrhset and Dtrhbag are based on a recursive hashing data structure that avoids table resizing.

AUTHOR

Kiem-Phong Vo, kpv@research.att.com