sql - TSearch2 - dots explosion -


following conversion

select to_tsvector('english', 'google.com'); 

returns this:

'google.com':1 

why tsearch2 engine didn't return this?

'google':2, 'com':1 

or how can make engine return exploded string wrote above? need "google.com" foundable "google".

unfortunately, there no quick , easy solution.

denis correct in parser recognizing hostname, why doesn't break up.

there 3 other things can do, off top of head.

  1. you can disable host parsing in database. see postgres documentation details. e.g. alter text search configuration your_parser_config drop mapping url, url_path

  2. you can write own custom dictionary.

  3. you can pre-parse data before it's inserted database in manner (maybe splitting domains before going database).


i had similar issue last year , opted solution (2), above.

my solution write custom dictionary splits words on non-word characters. custom dictionary lot easier & quicker write new parser. still have write c tho :)

the dictionary wrote return 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1' 'www.facebook.com' domain (we had unique-ish scenario, hence 4 results instead of 3).

the trouble custom dictionary no longer stemming (ie: www.books.com come out www, books , com). believe there work (which may have been completed) allow chaining of dictionaries solve problem.


Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -