Extract XML from .prt file using PHP but file becomes unreadable when opened with PHP

Tags: php xml file
By : KevBot
Source: Stackoverflow.com
Question!

I have a .prt (CAD Design File) that I need to extract some XML from using PHP. When I view this file directly in the browser, I can see the XML along with some unreadable areas. However, when I go to open it using PHP to get the XML I need from it, the file becomes mostly unreadable and the XML is no where to be found as the file looks like it was encrypted.

This is an example of what the .prt file looks like when opened directly in the browser: File in Browser

This is an example of what the file looks like when opened using PHP: Using PHP

This is how I am trying to open the file with PHP:

$handle = fopen("thePart.prt", "rb");
$contents = trim(stream_get_contents($handle));
fclose($handle);
//echo out contents to see what happens
echo $contents;

If I could get this file to open without doing what it is doing, I can get the XML out of it myself. How do I fix the issue that I am having? Thank you very much in advance.

By : KevBot


Answers

Real Answer

Turns out that there was no problem at all with the code. The browser was just interpreting the XML tags as HTML and so the data was not displayed (PHP by default sets a content type of text/html). When viewing the source code, the XML was plain and visible. The XML can also be seen without viewing the source by setting the content type of the php file:

header('Content-Type: text/plain');

This way, the browser will just display the XML as it is, without attempting to parse it as HTML first.

Initial Answer

Just a guess here, but it might be that you're opening the file in binary mode (the "rb" in your first line of code. Try opening it as a plain text file (use "r" instead of "rb").

More likely, it's an encoding issue where PHP is trying to decode a UTF-8 file as ASCII, for instance. Since you are opening a binary file (CAD Design File is binary with a little XML, I'm assuming), PHP might be getting confused while trying to detect the encoding of the file. I would need a copy of the file to know for sure.

Try comparing the result of mb_detect_encoding:

mb_detect_encoding($contents)

and the actual encoding of the XML data within the .prt file. If they are different, that's how you know that PHP is using the wrong encoding. In that case, use mb_convert_encoding to convert from PHP's detected encoding to that of the XML data.



This video can help you solving your question :)
By: admin