Objects already treated

BDb (htlib)

Replaces DB2

Configuration, htlib

Compiled, but not completed yet.
Uses UnicodeString::getBuffer (maybe questionable).

DocMatch, htsearch

Header only.

HtURLRewrite, htlist

Singleton, with the URL rewrite rules.

Parser, htsearch/parser

Compiled.
Done with QuotedStringList
Next handle the List argument of checkSyntax, invoked from htsearch.cc, maybe a list<WeightWord>? No... These are UnicodeStrings, so either a list or a vector of them. Maybe done.
Made result a vector<ResultList> (was a List*). Let's take the vector, because of operator[]
Now, WordList: done.
So continuing in parser.
Made partial changes for Berkeley Db... Also in the Makefile, but not in the template.
perform_push: I don't understand what should happen there. The role of wildcard is not clear.
The key is not checked!?
I'm just slurping the db...
Not sure where I push to...
In the old code, was p the key or the data? And what is the data? an index, convertible to an int? It depends on the db...
I decide to understand that the key is a (now unicode) string and the data in doc_index is (convertible to) an int (in dbf, it is WordRecord).
temp is compared to the key.
Not sure what to store as the key, from the unicode string. What about the value returned from getBuffer?
I have already done that in WordList and Configuration...
Note that the key is truncated to maximum_word_length

qstrings (was: QuotedStringList), htlib

Done, for Parser.
This is a vector rather than a list.

ResultList, htsearch

Inherited from Dictionary (deleted)
map<char const*, DocMatch>
Only const members... May be a problem?

Stack, htlib

Deleted. In parser, a vector<ResultList>, since stack::pop doesn't return anything,

strlist, htlib

vector<UnicodeString> with parse/split constructor.
Skips white space and punctuation.
Use through iterators.
Uses vector in order to have operator[]

WeightWord, htsearch

Done, for use in htsearch

WordList, htcommon

Done, for htsearch
Built around a map<UnicodeString, WordReference>
For valid_word:

alpha> ./alpha foo
text: foo
alpha1: 1, alpha2: 1
alpha> ./alpha foo2
text: foo2
alpha1: 0, alpha2: 0
alpha> ./alpha foo!
text: foo!
alpha1: 0, alpha2: 0
alpha> ./alpha таня
text: таня
alpha1: 1, alpha2: 1
alpha> ./alpha foo_bar
text: foo_bar
alpha1: 0, alpha2: 0
alpha> ./alpha foo-bar
text: foo-bar
alpha1: 0, alpha2: 0
alpha> ./alpha Épaminondas
text: Épaminondas
alpha1: 1, alpha2: 1
Done (small doubt about u_fopen_u, which may not support the append mode of fopen, although it is a wrapper, so it should work...
Otherwise, only use 'rw', and fseek(fl, 0, SEEK_END);

WordRecord, htcommon

Record stored inthe db.
Used in parser
Not done.

WordReference, htcommon

Header only struct. Done.

HtWordType, htlib

Just superficial changes to compile WordList
The issue of valid_punctuation, from htcommon/defaults.cc, is unclear. Ignored in HtStripPunctuation.

htsearch

Parser: done.

IntObject, htlib

At least header only. Probably not needed at all. Not referenced explicitly. Deleted.

Functions, examples

mystrncasecmp / mystrcasecmp


	if (mystrncasecmp(word, "exact:", 6) == 0)
	{
	    word += 6;
	    isExact = 1;
	}
becomes:

  while (pos = str.indexOf(UnicodeString("exact:"))) {
    if (pos != -1) {
      str.remove(pos, 6);
      isExact = true;
    } else break;
  }

operator<<


error << ' ' << boolean_keywords[1] << " '"
      << boolean_keywords[1] << "'";
Added in htcommon/uhelper.h

Using getBuffer() to store UnicodeString to BDb

In htsearch/parser.cc, not too sure this is a good idea...

  temp.toLower();
  char* p = (char*)temp.getBuffer();
  if (temp.length() > maximum_word_length) p[maximum_word_length] = '\0';
  key.set_data((void*)p);

Top, log
Marc Girod