sql - TSearch2 - dots explosion -
following conversion
select to_tsvector('english', 'google.com');
returns this:
'google.com':1
why tsearch2 engine didn't return this?
'google':2, 'com':1
or how can make engine return exploded string wrote above? need "google.com" foundable "google".
unfortunately, there no quick , easy solution.
denis correct in parser recognizing hostname, why doesn't break up.
there 3 other things can do, off top of head.
you can disable host parsing in database. see postgres documentation details. e.g.
alter text search configuration your_parser_config drop mapping url, url_path
you can write own custom dictionary.
you can pre-parse data before it's inserted database in manner (maybe splitting domains before going database).
i had similar issue last year , opted solution (2), above.
my solution write custom dictionary splits words on non-word characters. custom dictionary lot easier & quicker write new parser. still have write c tho :)
the dictionary wrote return 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1'
'www.facebook.com' domain (we had unique-ish scenario, hence 4 results instead of 3).
the trouble custom dictionary no longer stemming (ie: www.books.com come out www, books , com). believe there work (which may have been completed) allow chaining of dictionaries solve problem.
Comments
Post a Comment