While working on some InfoPath and Workflow I got bit again by the Byte Order Marker and I felt like I should document what’s going on. I was getting an exception… “The data at the root level is invalid. Line 1, position 1.” Here’s why:
The XML encoding that InfoPath uses is UTF-8. UTF-8 will make the first byte of the file (when decoded with UTF-8) a byte order marker. When XmlDocument sees this it’s confused. It expects the XML tag to appear at the very first character of the string you provide it. It’s simple to deal with – but frustrating to find. This code create a new XmlDocument, extracts the file contents from SharePoint, and loads it into the document.
XmlDocument wfDoc = new XmlDocument();
Byte[] fileBytes = wfFile.OpenBinary();
string fileAsString = (new System.Text.UTF8Encoding()).GetString(fileBytes);
wfDoc.LoadXml(fileAsString.Substring(1)); // SKIP Byte Order Marker @ beginning of file
A better approach is to hand off the stream to XmlDocument:
XmlDocument wfDoc = new XmlDocument();
using (Stream wfFileStrm = wfFile.OpenBinaryStream())
{
wfDoc.Load(wfFileStrm);
}
This will load fine without stripping the Byte Order Marker – but in my case, this isn’t supported in SharePoint Sandbox code because the System.IO.Stream type isn’t allowed.
2 Comments
Ah the beloved BOM. We’ve all been there :)
Reminds me of a handy extension method I wrote for detecting and removing ‘the mark’
http://blogs.edwardwilde.com/2009/09/09/xslcompiledtransform-the-utf-8-bom-systemxmlxmlexception-data-at-the-root-level-is-invalid-line-1-position-1/
Thanks… helped