Project „New Search Engine“
Evaluation of Results

Evaluation by means of single words (Comparison with Google, Comparison with Bing, Comparison with Seznam)
Evaluation by means of Google CSE
Conclusions - Remarks

I have downloaded and processed about 1 million WWW pages, evaluated 292 words.

1. Evaluation by means of single words

60 words, 30 English and 30 Czech, were evaluated.
For an evaluated word, on average 16 thousand WWW pages were downloaded and processed.
Evaluating was performed during of January 2012.

Criteria

First page good
Number of good links on the first page of found links – on the first 10 positions (which suit there).
Weight of the criterion: 0.67.
Second page good
Number of good links on the second page of found links – on the position 11-20 (which suit there).
Weight of the criterion: 0.33.

Difference

Computation: number of good links is divided by 10, the difference is computed and converted to percentages.
Resulting difference is the weighted average of the average difference of the first criterion and of the average difference of the second criterion.

1.1. Comparison with Google

Table of results



Word

Hlodac
First
page
good

Google
First
page
good

Difference
First
page
good

Hlodac
Second
page
good

Google
Second
page
good

Difference
Second
page
good

Hlodac
Saved
search
results
First
page

Hlodac
Saved
search
results
Second
page

Google
Saved
search
results
First
page

Google
Saved
search
results
Second
page

English words

 

 

 

 

 

 

cars

7

3

+40

4

5

-10

 links

 links

 links

 links

Internet (en)

10

7

+30

10

7

+30

 links

 links

 links

 links

sex (en)

9

10

-10

10

5

+50

 links

 links

 links

 links

restaurants

4

7

-30

3

1

+30

 links

 links

 links

 links

Sun

6

2

+40

3

2

+10

 links

 links

 links

 links

export (en)

6

6

0

7

1

+60

 links

 links

 links

 links

architects

9

2

+70

8

1

+70

 links

 links

 links

 links

musical

9

7

+20

10

1

+90

 links

 links

 links

 links

banks

7

1

+60

8

0

+80

 links

 links

 links

 links

book

8

7

+10

10

1

+90

 links

 links

 links

 links

concerts

10

6

+40

10

6

+40

 links

 links

 links

 links

country (en)

9

9

0

10

7

+30

 links

 links

 links

 links

disco

10

4

+60

9

0

+90

 links

 links

 links

 links

dulcimer

9

6

+30

9

6

+30

 links

 links

 links

 links

escort (en)

8

4

+40

8

2

+60

 links

 links

 links

 links

food

10

10

0

8

8

0

 links

 links

 links

 links

foundation

3

2

+10

4

0

+40

 links

 links

 links

 links

fruit

8

7

+10

7

5

+20

 links

 links

 links

 links

hockey

10

7

+30

9

6

+30

 links

 links

 links

 links

hospitals

7

9

-20

8

2

+60

 links

 links

 links

 links

chemistry

8

9

-10

9

8

+10

 links

 links

 links

 links

investment

10

7

+30

10

8

+20

 links

 links

 links

 links

mobile phone

6

3

+30

8

3

+50

 links

 links

 links

 links

pens

10

2

+80

9

10

-10

 links

 links

 links

 links

raw materials

8

4

+40

9

3

+60

 links

 links

 links

 links

ships

8

7

+10

9

2

+70

 links

 links

 links

 links

school

8

5

+30

9

5

+40

 links

 links

 links

 links

tree

7

3

+40

3

0

+30

 links

 links

 links

 links

water

6

8

-20

5

6

-10

 links

 links

 links

 links

history

8

5

+30

10

7

+30

 links

 links

 links

 links

sum (en)

 

 

+690

+1190

 

 

 

 

number (en)

30

30

 

 

 

 

average (en)

 

 

23.00

39.67

 

 

 

 

Czech words

 

 

 

 

 

 

Lednice

8

8

0

4

1

+30

 links

 links

 links

 links *

Valtice

10

9

+10

4

2

+20

 links

 links

 links

 links

Mikulov

8

5

+30

2

2

0

 links

 links

 links

 links

Morava

8

5

+30

3

2

+10

 links

 links

 links

 links

mobil

10

7

+30

10

3

+70

 links

 links

 links

 links

country (cz)

6

6

0

3

0

+30

 links

 links

 links

 links

muzikál

6

8

-20

4

1

+30

 links

 links

 links

 links

jídlo

8

4

+40

5

7

-20

 links

 links

 links

 links

škola

10

9

+10

8

2

+60

 links

 links

 links

 links

kniha

9

8

+10

10

5

+50

 links

 links

 links

 links

koncerty

8

7

+10

3

2

+10

 links

 links

 links

 links

hokej

9

9

0

8

6

+20

 links

 links

 links

 links

Internet (cz)

5

5

0

4

3

+10

 links

 links

 links

 links

strom

3

2

+10

6

0

+60

 links

 links

 links

 links

Slunce

5

7

-20

0

2

-20

 links

 links

 links

 links

investice

9

8

0

8

5

+30

 links

 links

 links

 links

banky

6

5

+10

10

3

+70

 links

 links

 links

 links

escort (cz)

5

5

0

7

5

+20

 links

 links

 links

 links

pera

9

5

+40

8

6

+20

 links

 links

 links

 links

ovoce

8

7

+10

8

2

+60

 links

 links

 links

 links

export (cz)

6

6

0

3

3

0

 links

 links

 links

 links

lodě

9

9

0

8

4

+40

 links

 links

 links

 links

architekti

8

5

+30

6

0

+60

 links

 links

 links

 links

nemocnice

3

1

+20

0

0

0

 links

 links

 links

 links

nadace

3

2

+10

2

0

+20

 links

 links

 links

 links

voda

6

8

-20

7

4

+30

 links

 links

 links

 links

cimbál

2

2

0

0

2

-20

 links

 links

 links

 links

chemie

8

6

+20

7

7

0

 links

 links

 links

 links

diskotéka

6

3

+30

5

1

+40

 links

 links

 links

 links

suroviny

8

7

+10

6

3

+30

 links

 links

 links

 links

sum (cz)

 

 

+290

+760

 

 

 

 

number (cz)

30

30

 

 

 

 

average (cz)

 

 

+9.67

+25.33

 

 

 

 

sum (all words)

 

 

+980

+1950

 

 

 

 

number (all words)

60

60

 

 

 

 

average (all words)

 

 

+16.33

+32.50

 

 

 

 


Resulting differences
English words: Hlodac - Google = 0.67*23.00+0.33*39.67 = 28.50 percent.
Czech words: Hlodac - Google = 0.67*9.67+0.33*25.33 = 14.84 percent.
All words (together): Hlodac - Google = 0.67*16.33+0.33*32.50 = 21.67 percent.

* The word „Lednice” is with capital L, i.e. city; not “lednice” (cooling device).
Since only the links to the city are considered (and therefore it is necessary for the comparison to reach the first 20 good of these links), it was necessary in case of Google to save the first 5 pages of found links.
Here are the links to the 3-5th page of found links: links3 - links4 - links5.

Remarks

1.
For the evaluation by single words 60 words were used, about 1 million WW pages.
2.
For the English words google.com (English) was used, for the Czech words google.cz (Czech) was used,
3.
For the evaluation of Google just the basic (external) links were used, there are 10 such links on every page of found links.
These links were not used: advertising links, sponsored links, links of AdWords, inserted (internal) links to news, images and videos.
4.
For the evaluation the Czech words, the links to English pages are considered to be bad (this holds for Hlodac as well as for Google).
5.
The comparison was performed with the saved results of searching (see the links in the table) during January 2012.
The present search results can change within the time, as the WWW pages as well as the algorithm of Google change.
6.
Resulting difference is bigger than theoretical estimation.
Theoretically I have estimated that the order of Hlodac (given by the sets) will be by 10 percent better than the order of Google (given by single WWW pages).
In practical comparison (of single words) Hlodac resulted to be by 20 percent better than Google.


1.2. Comparison with Bing

Table of results



Word

Hlodac
First
page
good

Bing
First
page
good

Difference
First
page
good

Hlodac
Second
page
good

Bing
Second
page
good

Difference
Second
page
good

Hlodac
Saved
search
results
First
page

Hlodac
Saved
search
results
Second
page

Bing
Saved
search
results
First
page

Bing
Saved
search
results
Second
page

English words

 

 

 

 

 

 

cars

7

5

+20

4

2

+20

 links

 links

 links

 links

Internet (en)

10

5

+50

10

8

+20

 links

 links

 links

 links

sex (en)

9

10

-10

10

7

+30

 links

 links

 links

 links

restaurants

4

6

-20

3

1

+20

 links

 links

 links

 links

Sun

6

7

-10

3

6

-30

 links

 links

 links

 links

export (en)

6

5

+10

7

4

+30

 links

 links

 links

 links

architects

9

4

+50

8

2

+60

 links

 links

 links

 links

musical

9

9

0

10

6

+40

 links

 links

 links

 links

banks

7

3

+40

8

3

+50

 links

 links

 links

 links

book

8

7

+10

10

5

+50

 links

 links

 links

 links

concerts

10

9

+10

10

3

+70

 links

 links

 links

 links

country (en)

9

9

0

10

3

+70

 links

 links

 links

 links

disco

10

5

+50

9

3

+60

 links

 links

 links

 links

dulcimer

9

8

+10

9

9

0

 links

 links

 links

 links

escort (en)

8

5

+30

8

8

0

 links

 links

 links

 links

food

10

10

0

8

6

-20

 links

 links

 links

 links

foundation

3

4

-10

4

1

+30

 links

 links

 links

 links

fruit

8

7

+10

7

6

+10

 links

 links

 links

 links

hockey

10

5

+50

9

5

+40

 links

 links

 links

 links

hospitals

7

6

+10

8

1

+70

 links

 links

 links

 links

chemistry

8

8

0

9

7

+20

 links

 links

 links

 links

investment

10

8

+20

10

6

+40

 links

 links

 links

 links

mobile phone

6

5

+10

8

3

+50

 links

 links

 links

 links

pens

10

7

+30

9

7

+20

 links

 links

 links

 links

raw materials

8

3

+50

9

1

+80

 links

 links

 links

 links

ships

8

4

+40

9

3

+60

 links

 links

 links

 links

school

8

5

+30

9

3

+60

 links

 links

 links

 links

tree

7

8

-10

3

0

+30

 links

 links

 links

 links

water

6

9

-30

5

2

+30

 links

 links

 links

 links

history

8

7

+10

10

6

+40

 links

 links

 links

 links

sumary (en)

 

 

+450

+1050

 

 

 

 

number (en)

30

30

 

 

 

 

average (en)

 

 

+15.00

35.00

 

 

 

 

Czech words

 

 

 

 

 

 

Lednice

8

9

-10

4

6

-20

 links

 links

 links

 links *

Valtice

10

9

+10

4

5

-10

 links

 links

 links

 links

Mikulov

8

6

+20

2

4

-20

 links

 links

 links

 links

Morava

8

2

+60

3

0

+30

 links

 links

 links

 links

mobil

10

0

+100

10

0

+100

 links

 links

 links

 links

country (cz)

6

6

0

3

3

0

 links

 links

 links

 links

muzikál

6

0

+60

4

0

+40

 links

 links

 links

 links

jídlo

8

3

+50

5

2

+30

 links

 links

 links

 links

škola

10

2

+80

8

0

+80

 links

 links

 links

 links

kniha

9

3

+60

10

0

+100

 links

 links

 links

 links

koncerty

8

0

+80

3

0

+30

 links

 links

 links

 links

hokej

9

2

+70

8

0

+80

 links

 links

 links

 links

Internet (cz)

5

4

+10

4

4

0

 links

 links

 links

 links

strom

3

0

+30

6

0

+60

 links

 links

 links

 links

Slunce

5

2

+30

0

0

0

 links

 links

 links

 links

investice

9

1

+80

8

2

+60

 links

 links

 links

 links

banky

6

0

+60

10

0

+100

 links

 links

 links

 links

escort (cz)

5

6

-10

7

5

+20

 links

 links

 links

 links

pera

9

0

+90

8

0

+80

 links

 links

 links

 links

ovoce

8

0

+80

8

0

+80

 links

 links

 links

 links

export (cz)

6

3

+30

3

1

+20

 links

 links

 links

 links

lodě

9

4

+50

8

2

+60

 links

 links

 links

 links

architekti

8

0

+80

6

0

+60

 links

 links

 links

 links

nemocnice

3

0

+30

0

0

0

 links

 links

 links

 links

nadace

3

0

+30

2

0

+20

 links

 links

 links

 links

voda

6

0

+60

7

0

+70

 links

 links

 links

 links

cimbál

2

1

+10

0

0

0

 links

 links

 links

 links

chemie

8

0

+80

7

0

+70

 links

 links

 links

 links

diskotéka

6

1

+50

5

0

+50

 links

 links

 links

 links

suroviny

8

2

+60

6

0

+60

 links

 links

 links

 links

sum (cz)

 

 

+1420

+1250

 

 

 

 

number (cz)

30

30

 

 

 

 

average (cz)

 

 

+47.33

+41.67

 

 

 

 

sum (all words)

 

 

+1870

+2300

 

 

 

 

number (all words)

60

60

 

 

 

 

average (all words)

 

 

31.17

38.33

 

 

 

 


Resulting differences
English words: Hlodac - Bing = 0.67*15.00+0.33*35.00 = 21.60 percent.
Czech words: Hlodac - Bing = 0.67*47.33+0.33*41.67 = 45.46 percent.
All words (together): Hlodac - Bing = 0.67*31.17+0.33*38.33 = 33.53 percent.

Remarks


1.
For the evaluation by single words 60 words were used, about 1 million WW pages.
2.
For the English words bing.com (International, USA, English) was used, for the Czech words bing.cz was used,
3.
For the evaluation of Bing just the basic (external) links were used, there are 10 such links on every page of found links.
These links were not used: advertising links, sponsored links, inserted (internal) links to news, images and videos.
4.
For the Czech words:
- at Hlodac links to English pages are considered to be bad.
- Bing mixes the links to Czech and world (English, other languages) pages; therefore at Bing the links to relevant English pages are considered to be good (in tolerance); otherwise nearly all Bing links would be bad.
5.
The comparison was performed with the saved results of searching (see the links in the table) during January 2012.
The present search results can change within the time, as the WWW pages as well as the algorithm of Bing change.


1.3. Comparison with Seznam

Table of results



Word

Hlodac
First
page
good

Seznam
First
page
good

Difference
First
page
good

Hlodac
Second
page
good

Seznam
Second
page
good

Difference
Second
page
good

Hlodac
Saved
search
results
First
page

Hlodac
Saved
search
results
Second
page

Seznam
Saved
search
results
First
page

Seznam
Saved
search
results
Second
page

Czech words

 

 

 

 

 

Lednice

8

8

0

4

1

+30

 links

 links

 links

 links *

Valtice

10

7

+30

4

3

+10

 links

 links

 links

 links

Mikulov

8

5

+30

2

4

-20

 links

 links

 links

 links

Morava

8

7

+10

3

2

+10

 links

 links

 links

 links

mobil

10

9

+10

10

7

+30

 links

 links

 links

 links

country (cz)

6

7

-10

3

1

+20

 links

 links

 links

 links

muzikál

6

3

+30

4

3

+10

 links

 links

 links

 links

jídlo

8

3

+50

5

4

+10

 links

 links

 links

 links

škola

10

8

+20

8

7

+10

 links

 links

 links

 links

kniha

9

10

-10

10

8

+20

 links

 links

 links

 links

koncerty

8

6

+20

3

4

-10

 links

 links

 links

 links

hokej

9

7

+20

8

4

+40

 links

 links

 links

 links

Internet (cz)

5

4

+10

4

2

+20

 links

 links

 links

 links

strom

3

1

+20

6

0

+60

 links

 links

 links

 links

Slunce

5

8

-30

0

1

-10

 links

 links

 links

 links

investice

9

9

0

8

1

+70

 links

 links

 links

 links

banky

6

2

+40

10

2

+80

 links

 links

 links

 links

escort (cz)

5

4

+10

7

4

+30

 links

 links

 links

 links

pera

9

9

0

8

5

+30

 links

 links

 links

 links

ovoce

8

4

+40

8

2

+60

 links

 links

 links

 links

export (cz)

6

3

+30

3

2

+10

 links

 links

 links

 links

lodě

9

7

+20

8

7

+10

 links

 links

 links

 links

architekti

8

7

+10

6

0

+60

 links

 links

 links

 links

nemocnice

3

0

+30

0

0

0

 links

 links

 links

 links

nadace

3

0

+30

2

2

0

 links

 links

 links

 links

voda

6

3

+30

7

2

+50

 links

 links

 links

 links

cimbál

2

2

0

0

0

0

 links

 links

 links

 links

chemie

8

8

0

7

5

+20

 links

 links

 links

 links

diskotéka

6

5

+10

5

2

+30

 links

 links

 links

 links

suroviny

8

4

+40

6

4

+20

 links

 links

 links

 links

sum

 

+480

+700

 

 

 

 

number

30

30

 

 

 

 

average

16.00

23.33

 

 

 

 


Resulting difference
Hlodac - Seznam = 0.67*16.00+ 0.33*23.33 = 18.42 percent.

* The word „Lednice” is with capital L, i.e. city; not “lednice” (cooling device).
Since only the links to the city are considered (and therefore it is necessary for the comparison to reach the first 20 good of these links), it was necessary in case of Google to save the first 5 pages of found links.
Here are the links to the 3-7th page of found links: links3 - links4 - links5links6links7.

1.
For the evaluation by single words 30 words were used, about 300 thousand WWW pages.
2.
Only Czech words were compared, as Seznam uses Bing for world (English) searching.
3.
As Czech words are searched, the searching on Seznam is made from the home page seznam.cz, where is as default „Czech“ searching (see pages of found links).
4.
For the evaluation of Seznam just the basic (external) links were used, there are 10 such links on every page of found links.
These links were not used: advertising links, sponsored links, links of Sklik, inserted (internal) links to news, images and videos.
5.
As only Czech words are compared with Seznam, the links to English pages are considered to be bad (this holds for Hlodac as well as for Seznam).
6.
The comparison was performed with the saved results of searching (see the links in the table) during January 2012.
The present search results can change within the time, as the WWW pages as well as the algorithm of Seznam change.


2. Evaluation by means of Google CSE (Google Custom Search)

Google CSE enables the comparison of the search results of Hlodac and Google on several domains.
Using Google CSE, Hlodac as well as Google are searching within the same set of domains (WWW pages), i.e. under nearly equal conditions.
WWW pages of these 22 domains were downloaded and processed: http://www.jiznimorava.org/servery
Summary number of WWW pages in these domains is round 3000.
I have evaluated 232 Czech words.
For each word only the first 5 links were evaluated, as the comparison of the first 10 links does not make sense
(Hlodac and Google place nearly always the good links, which are for the searched word available in these domains, within the first 10 links).

Criterion

Good of first 5 links
Number of good links on the first 5 positions within found links (which suit there).
Weight of the criterion: 1.00.

Difference

Computation: number of good links is divided by 5, the difference is computed and converted to percentages.
Resulting difference is the average of the differences of the words.

Table of results

Word

Hlodac
Good of
first 5
links

Google
Good of
first 5
links

Difference
Good of
first 5
links

Hlodac
Saved
search
results
First
page

Google
Saved
search
results
First
page

Czech words

 

 

Lednice

5

3

+40

links

links *

Valtice

2

2

0

links

links

Mikulov

3

2

+20

links

links

Morava

5

3

+40

links

links

Lednicko-valtický areál

0

3

-60

links

links

historie

4

1

+60

links

links

legendy

3

1

+40

links

links

víno

4

1

+60

links

links

zpěvník

3

2

+20

links

links

country

2

2

0

links

links

koledy

2

1

+20

links

links

muzikál

2

4

-40

links

links

relaxace

2

2

0

links

links

vtipy

2

3

-20

links

links

koně

4

3

+20

links

links

Alveo

2

2

0

links

links

kuřáci

2

3

-20

links

links

Martiš

2

1

+20

links

links

hledač

1

1

0

links

links

Hlodac

2

2

0

links

links

Lednice Valtice

5

4

+20

links

links

Jižní Morava

2

2

0

links

links

služby

5

1

+80

links

links

ubytování

5

1

+80

links

links

hotely

3

1

+40

links

links

penziony

3

1

+40

links

links

stravování

4

2

+40

links

links

restaurace

5

1

+80

links

links

zábava

3

2

+20

links

links

atrakce

5

3

+40

links

links

noční kluby

1

1

0

links

links

památky

4

2

+40

links

links

hrady

2

2

0

links

links

zámky

2

1

+20

links

links

salety

4

2

+40

links

links

města

3

1

+20

links

links

příroda

5

0

+100

links

links

Pálava

3

1

+40

links

links

Pálavské vrchy

4

2

+40

links

links

Pavlovské vrchy

4

2

+40

links

links

vrchy

2

0

+20

links

links

Martinka

4

1

+60

links

links

kultura

2

0

+40

links

links

sport

5

1

+80

links

links

fotografie

5

1

+80

links

links

mapy

3

0

+60

links

links

okolí

5

2

+60

links

links

region

1

0

0

links

links

turistika

2

3

-20

links

links

kontakty

5

2

+60

links

links

Petr Hejl

3

4

-20

links

links

náplně

5

0

+100

links

links

turistické cíle

5

1

+80

links

links

trasy

2

3

-20

links

links

nejlepší

5

0

+100

links

links

průvodci

5

0

+100

links

links

úřady

4

3

+20

links

links

Czech republika

3

2

+20

links

links

Evropská Unie

4

1

+60

links

links

OSN

4

0

+80

links

links

UNESCO

3

0

+60

links

links

Světové dědictví

1

2

-20

links

links

oblíbené položky

2

2

0

links

links

WWW odkazy

5

1

+80

links

links

akce

2

1

+20

links

links

zábavné akce

2

2

0

links

links

společenské akce

1

3

-40

links

links

firemní akce

2

2

0

links

links

outdoor

0

2

0

links

links

jídelní lístky

3

1

+40

links

links

místnosti

3

1

+40

links

links

jízdárny

3

1

+40

links

links

venkovní prostory

3

1

+40

links

links

Terasa Onyx

4

1

+60

links

links

konference

2

1

+20

links

links

školení

2

2

0

links

links

prezentace

2

2

0

links

links

coffee break

3

0

+60

links

links

hostina

3

1

+40

links

links

pečené sele

3

0

+60

links

links

barbeque

3

2

+20

links

links

zabíjačka

4

2

+40

links

links

zvěřinové hody

4

1

+60

links

links

degustace

3

3

0

links

links

Švejk

3

0

+60

links

links

kouzelníci

3

1

+40

links

links

historické pořady

3

1

+40

links

links

živá prohlídka

2

1

+20

links

links

módní přehlídka

1

2

-20

links

links

ohňostroj

3

0

+60

links

links

svatba

4

0

+80

links

links

zámecký program

2

1

+20

links

links

Zámek Lednice

3

3

0

links

links

Jeskyně Grotta

2

1

+20

links

links

skleník

3

1

+40

links

links

zahrada

3

1

+40

links

links

park