“New Search Engine“
Differences between words (processed, similar, MySites)
For each processed word I have downloaded and processed on average 16 thousand WWW pages.
For the processed words the sets are constructed correctly, so that their search results are relevant.
There are 60 processed words, 30 in English and 30 in Czech.
The purpose of the similar words is to prove, that Hlodac searches (reasonably) also other words than the processed words.
It is possible to search the similar word to the processed word within the set of WWW pages of this processed word.
E.g. the word „universities“ is similar to word „school“.
When searching similar words, usually only some of leading links are relevant, other links being non-relevant.
The cause is not the algorithm, but lack of data (insufficient number of WWW pages downloaded for the similar word).
That is why, there are only a few relevant links for the similar word and
the sets for these words are not constructed properly (from one subset 10% pages are downloaded, from another subset 90% pages are downloaded).
There is large number of similar words. Chosen similar words can be clicked on at the home page of Hlodac (see the link to the list below).
MySites words enable exact comparison of Hlodac and Google. Google CSE (Google Custom Search is used for the comparison).
Hlodac and Google search on the same 22 domains, i.e. under nearly identical condition.
The MySites words may be any, however for reasonable search they should relate to the content of the downloaded domains.
Number of evaluated MySites words is 232.
Thus, summary number of words, which you can click on at the home page of Hlodac is 60 + 232 + 277 = 569.
Here are the lists of processed, similar and MySites words.
Here is the difference between processed and non-processed words.
Here is the difference between processed and similar words.