php - How can I use RegEx to determine the largest chunk between delimiters? -


regex determine longest "part" of phrase, specified delimiters?

news stories have sort of structure, title plus bunch of garbage. there way regex out garbage , maintain longest part of title, require using delimiters such |, -, :, etc...

here examples

eband | jornalismo | saúde | alimentos em conserva podem causar botulismo; saiba como evitar doença

obama calls wide-range immigration reform in el paso - san jose mercury news

cl + suspensa produção de mortadela com toucinho, suspeita de contaminação

bbc news - john kerry travel pakistan amid strained ties

not regex think. can split title on "garbage" characters, , sort length of remaining parts.

$parts = preg_split('#\s*[-|:+]+\s*#', $title); $parts = array_combine($parts, array_map("strlen", $parts)); arsort($parts); $longest = current(array_keys($parts)); 

instead of specific delimiters, split on non-word symbols \w (or [^\pl] /u unicode flag).


Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -