Regex/wildcard replace on a string PHP -


i have mass of text gets loaded header, , within lies link.

<link rel="canonical" href="could_be_anything_here_at_all" /> 

i'm looking replace new value, href changes based on page meaning simple str_replace isn't possible.

i've looked @ using preg_replace, can't head around seems simple problem.

    $regex = '/(^<link rel="canonical")(\/>$)/';     $match = preg_match_all($regex, $content, $matches);     var_dump($matches); 
  • the / / start , end expression?
  • the () indicate separate 'expressions' have matched string returned?
  • the ^ filters results begin following string?
  • the $ filters results end following string?

so i'm looking string begins <link rel="canonical" , ends />

i've shown steps i'm after, , stab @ it. please me write , understand how it. i'm @ loss on one.

the regular expression you've written on place. let's go on pattern:

whatever happens, begin <link , end ></link> or /> (gotta account pesky non-respecting-of-standards web buccaneers). you're looking rel parameter, if has one, , needs canonical.

we can start writing regular expression: #<link([^>]+)(/>|></link>)#is. map link tags. can parse parameters using simple strpos calls.

if sure rel="canonical" first parameter of link tag, can expand regular expression further #<link rel="canonical" href="?'?([^"']+)"?'?(/>|></link>)#is. map in order, fine if sure order.

in order of appearance:

[^>]+ matches > character 1 or more times

the is flags stand for: case-insensitive, not break on newline

"?'? matches 0 or 1 ", followed 0 or 1 '

if else unclear, let me know.

edit: answer questions

  • the / / start , end expression? they're called delimiters, , "encase" expression. perl regular expression engine allows flags set regarding expression (i, s, g, b, etc), , have out of expression. go after delimiter - , point of delimiter. can use character - pick furthest 2 repeating ones. people tend use / due js using single char them - tend prefer # in php clear / ambiguities arising closing html tags.

  • the () indicate separate 'expressions' have matched string returned? () matches subset , allows in results if specify variable matches. every part of regular expression can use wildcards & co, stuff encased in () returned in matches

  • the ^ filters results begin following string? nope. ^ outside [] range match starts following string full stop. on new line, effectively, not "words".
  • the $ filters results end following string? same above, "end" rather "start".

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -