PHP: Convert Html Entities to Xml Entities

I have made a small change to this great function by regin (via sourcerally.net) to use the preg_replace instead of str_replace as I found it to be much faster when dealing with large xml files. This function comes in handy when creating xml files  and you need to quickly convert all the html entites to […]

I have made a small change to this great function by regin (via sourcerally.net) to use the preg_replace instead of str_replace as I found it to be much faster when dealing with large xml files.

This function comes in handy when creating xml files  and you need to quickly convert all the html entites to the correct xml entities, reducing errors. Else if you need to load an xml file with simplexml, libxml etc and you get undefined entity error, you can start debuging by converting all the entities correctly first as this is often the problem.

Function:

function xmlEntities($str){

$xml = array('"','&','&','< ','>',' ','¡','¢','£','¤','¥','¦','§','¨','©','ª','«','¬','­','®','¯','°','±','²','³','´','µ','¶','·','¸','¹','º','»','¼','½','¾','¿','À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','×','Ø','Ù','Ú','Û','Ü','Ý','Þ','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ð','ñ','ò','ó','ô','õ','ö','÷','ø','ù','ú','û','ü','ý','þ','ÿ');
$html = array('/"/i','/&/i','/&/i','//i','/ /i','/¡/i','/¢/i','/£/i','/¤/i','/¥/i','/¦/i','/§/i','/¨/i','/©/i','/ª/i','/«/i','/¬/i','/­/i','/®/i','/¯/i','/°/i','/±/i','/²/i','/³/i','/´/i','/µ/i','/¶/i','/·/i','/¸/i','/¹/i','/º/i','/»/i','/¼/i','/½/i','/¾/i','/¿/i','/À/i','/Á/i','/Â/i','/Ã/i','/Ä/i','/Å/i','/Æ/i','/Ç/i','/È/i','/É/i','/Ê/i','/Ë/i','/Ì/i','/Í/i','/Î/i','/Ï/i','/Ð/i','/Ñ/i','/Ò/i','/Ó/i','/Ô/i','/Õ/i','/Ö/i','/×/i','/Ø/i','/Ù/i','/Ú/i','/Û/i','/Ü/i','/Ý/i','/Þ/i','/ß/i','/à/i','/á/i','/â/i','/ã/i','/ä/i','/å/i','/æ/i','/ç/i','/è/i','/é/i','/ê/i','/ë/i','/ì/i','/í/i','/î/i','/ï/i','/ð/i','/ñ/i','/ò/i','/ó/i','/ô/i','/õ/i','/ö/i','/÷/i','/ø/i','/ù/i','/ú/i','/û/i','/ü/i','/ý/i','/þ/i','/ÿ/i');

$str = preg_replace($html,$xml,$str);
$str = preg_replace($html,$xml,$str);

return $str;
}

Associated Errors:

  • Just some errors which you might be able to resolve using this funtion:
    • Undefined entity at line
    • Entitynbsp‘ not defined
    • Entityamp‘ not defined
    • Entitypound‘ not defined
    • Entity&lt‘ not defined

Links:

http://www.sourcerally.net/Scripts/39-Convert-HTML-Entities-to-XML-Entities

Author: admin

See all posts by (18)

3 comments until now

  • How elegant!
    I’m using this now.
    Thanks!

    The “Associated Errors”… are those errors that might occur if I used this function? Or errors that this function solves?

    By Pup June 18, 2009 @ 5:28 pm
  • Hmmm… produces output similar to the following:
    <><<>!<>D<>O<>C<>T<>Y<>P<>E<>><>h<>t<>m<>l<>><>P<>U<>B<>L<>I<>C<>><>”<>-<>/<>/<>W<>3<>C<>/<>/<>D<>T<>D<>><>X<>H<>T<>M<>L<>><>1<>.<>0<>><>S<>t<>r<>i<>c<>t<>/<>/<>E<>N<>”<>><>”<>h<>t<>t<>p<>:<>/<>/<>w<>w<>w<>.<>w<>3<>.<>o<>r<>g<>/<>T<>R<>/<>x<>h<>t<>m<>l<>1<>/<>D<>T<>D<>/<>x<>h<>t<>m<>l<>1<>-<>s<>t<>r<>i<>c<>t<>.<>d<>t<>d<>”<>><> <> <><<>h<>t<>m<>l<>><>x<>m<>l<>n<>s<>=<>”<>h<>t<>t<>p<>:<>/<>/<>w<>w<>w<>.<>w<>3<>.<>o<>r<>g<>/<>1<>9<>9<>9<>/<>x<>h<>t<>m<>l<>”<>><> <> <><<><

    By Pup June 18, 2009 @ 6:52 pm
  • woo-hoo! I found out I don’t need to do this with Unicode!

    By Pup June 19, 2009 @ 5:36 am

Leave a Reply