How to Use POIFS Browser to Read Microsoft OLE Files

How to Use POIFS Browser to Read Microsoft OLE FilesMicrosoft OLE (Object Linking and Embedding) Compound File Binary Format — often called OLE CF, Compound File, or Structured Storage — is a container format used by many older Microsoft document types (e.g., legacy Word, Excel, PowerPoint files, and some proprietary application files). POIFS Browser is a tool that lets you inspect and extract the structures and streams inside these OLE files. This article explains what OLE files are, why you might inspect them, and provides a step-by-step walkthrough for using POIFS Browser effectively, with practical tips for troubleshooting and for programmatic alternatives.


What is an OLE (Compound File) and why inspect it?

An OLE Compound File behaves like a filesystem inside a single file: it contains a directory of “storages” (like folders) and “streams” (like files). Each stream holds binary or text data, and storages group related streams together. Common reasons to inspect OLE files:

  • Recover embedded data (images, text, embedded spreadsheets).
  • Investigate file corruption or repair malformed documents.
  • Reverse-engineer proprietary formats that use OLE for packaging.
  • Extract metadata or forensic artifacts (timestamps, author info).
  • Learn how older Microsoft document formats store content.

POIFS Browser is a user-friendly explorer for that internal structure. It lets you view the directory tree, open streams (as text or hex), extract streams to disk, and inspect properties like stream sizes and timestamps.


Installing and launching POIFS Browser

POIFS Browser is commonly provided as a standalone tool or bundled with libraries (for example, projects around Apache POI). Installation methods vary by distribution; typical options:

  • Download a prebuilt executable or jar from the project’s release page.
  • If it’s a Java jar, run:
    
    java -jar POIFSBrowser.jar 
  • On platforms with package managers, install via the relevant package if available.

When you launch POIFS Browser it usually opens a simple GUI with a file-open dialog and a left-hand tree view representing storages and streams.


Opening an OLE file

  1. File → Open (or click the folder icon).
  2. Select the .doc, .xls, .ppt or other compound file. POIFS Browser will parse the compound file and display the root storage and its children in the tree view.
  3. If parsing fails, the file may not be a valid OLE container or may be corrupted. Try opening a known-good example to confirm the tool is functioning.

  • The root entry is often called “Root Entry” or similar. Under it you’ll see storages and streams.
  • Storages appear like folders and can contain nested storages/streams.
  • Streams are leaf nodes representing actual data (e.g., “Workbook”, “WordDocument”, “SummaryInformation”, or custom stream names).

Tip: Look for standard stream names:

  • Word documents: “WordDocument”, “1Table”/“0Table”, “Data”
  • Excel: “Workbook” or “Book”, plus “CompObj”, “SummaryInformation”
  • PowerPoint: “PowerPoint Document” and various slide streams

Viewing stream contents

POIFS Browser typically supports multiple views:

  • Text view: attempts to decode the stream as text (useful for ASCII/UTF-16 or XML-based streams).
  • Hex view: shows raw bytes in hexadecimal with an ASCII column — essential for binary streams.
  • Derived viewers: some tools can render images or extract OLE-embedded objects.

How to approach different stream types:

  • If the stream contains readable text or XML, use Text view to copy meaningful content.
  • If it’s binary (e.g., a portion of a Word document), use Hex view and search for known signatures or patterns.
  • For UTF-16 text (common in older Word streams), try toggling encoding to view readable characters.

Extracting streams and storages

  • Right-click a stream → Export/Save → choose output path and filename.
  • For storages, some tools allow exporting the entire storage tree into separate files or reconstructing embedded files.
  • Extracted streams can be opened with appropriate apps (e.g., saved images open in an image viewer, saved document streams may require reconstruction into a valid file).

Example workflow to extract embedded image:

  1. Navigate to a stream with image-like data (look for common image headers like PNG 89 50 4E 47 or JPEG FF D8).
  2. Export stream as .png or .jpg and open with an image viewer.
  3. If extraction yields a chunk that doesn’t open, it may be wrapped or fragmented — further analysis required.

Working with property sets and metadata

OLE files can contain property streams like “SummaryInformation” and “DocumentSummaryInformation” that store metadata (title, author, last saved, creation/modification times). POIFS Browser typically decodes common property sets; if not, you can export the raw stream and use a library (e.g., Apache POI) to parse properties programmatically.


Repairing and recovering content

If a document is corrupted:

  • Inspect directory entries to confirm stream sizes and existence of expected streams (e.g., “WordDocument”).
  • If a key stream is missing or truncated, attempt to extract remaining streams and rebuild a document using a library.
  • Some tools attempt low-level repair by reconstructing FAT or MiniFAT tables inside the compound file; POIFS Browser may or may not have that capability.

Forensic tip: copy the file before attempting any writes or repairs.


Using POIFS Browser programmatically (Apache POI example)

If you prefer code, Apache POI provides APIs to read OLE2 compound files (POIFSFileSystem). Minimal Java example to list streams:

import java.io.FileInputStream; import org.apache.poi.poifs.filesystem.*; public class ListOleStreams {   public static void main(String[] args) throws Exception {     try (FileInputStream fis = new FileInputStream("example.doc");          POIFSFileSystem fs = new POIFSFileSystem(fis)) {       DirectoryNode root = fs.getRoot();       listEntries(root, "");     }   }   static void listEntries(DirectoryNode dir, String indent) {     for (Entry entry : dir) {       System.out.println(indent + entry.getName() + (entry instanceof DirectoryNode ? "/" : ""));       if (entry instanceof DirectoryNode) {         listEntries((DirectoryNode) entry, indent + "  ");       }     }   } } 

To extract a stream:

DocumentEntry de = (DocumentEntry) root.getEntry("WordDocument"); try (DocumentInputStream dis = new DocumentInputStream(de);      FileOutputStream fos = new FileOutputStream("WordDocument.stream")) {   byte[] buf = new byte[8192];   int r;   while ((r = dis.read(buf)) != -1) fos.write(buf, 0, r); } 

Common pitfalls and troubleshooting

  • Not an OLE file: many newer Office files (.docx, .xlsx, .pptx) use Open XML (zip) rather than OLE. Open those with a zip tool.
  • Encrypted or password-protected: streams may be encrypted; extraction may yield unintelligible bytes.
  • Mini streams: small streams are packed into the MiniFAT and require proper parsing — use a tool/library that supports MiniFAT.
  • Character encodings: many Word streams use UTF-16LE; viewing with the wrong encoding produces gibberish.
  • Fragmented data: embedded objects might be split; reconstructing them can be complex.

Practical examples

  • Extract metadata: open SummaryInformation stream and read title/author fields.
  • Recover embedded images: search for image headers in streams, export to image files.
  • Investigate macros: inspect streams for “Macros” or VBA storage to analyze code (use caution—macros may be malicious).
  • For developers: use Apache POI to programmatically extract streams, convert legacy formats to modern equivalents, or batch-process multiple files.

Security considerations

  • Do not open suspicious files on an internet-connected, production machine — use an isolated VM.
  • Extracted macros or executables can be malicious; scan with antivirus or inspect in a safe environment.
  • Keep backups of originals before modifying files.

Conclusion

POIFS Browser is a practical tool for inspecting, extracting, and troubleshooting Microsoft OLE compound files. For simple inspection and extraction, its GUI is often sufficient; for bulk processing or automation, pair it with libraries such as Apache POI. Understanding OLE’s storage and stream model (storages, streams, MiniFAT) makes it easier to recover data, analyze metadata, or reverse-engineer older document formats.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *