php - How can I use RegEx to determine the largest chunk between delimiters? -
regex determine longest "part" of phrase, specified delimiters?
news stories have sort of structure, title plus bunch of garbage. there way regex out garbage , maintain longest part of title, require using delimiters such |
, -
, :
, etc...
here examples
eband |
jornalismo |
saúde |
alimentos em conserva podem causar botulismo; saiba como evitar doença
obama calls wide-range immigration reform in el paso -
san jose mercury news
cl +
suspensa produção de mortadela com toucinho, suspeita de contaminação
bbc news -
john kerry travel pakistan amid strained ties
not regex think. can split title on "garbage" characters, , sort length of remaining parts.
$parts = preg_split('#\s*[-|:+]+\s*#', $title); $parts = array_combine($parts, array_map("strlen", $parts)); arsort($parts); $longest = current(array_keys($parts));
instead of specific delimiters, split on non-word symbols \w
(or [^\pl]
/u unicode flag).
Comments
Post a Comment