Methodology · RTF

RTF forensics

Rich Text Format — older but still common; minimal metadata but distinctive container signals.

RTF forensics

Rich Text Format — .rtf — is an older Microsoft document format that predates DOCX by a decade. It's a plain-text container with formatting represented as control codes (\b for bold, \par for paragraph break, etc.).

RTF still appears as a submission format from:

  • Older word processors that default to RTF (some legacy WordPerfect, AbiWord, or specific TextEdit configurations).
  • Cross-platform conversion workflows where DOCX wasn't an option.
  • Specific accessibility tools that prefer RTF as a portable intermediate.

Signals available on RTF

  • Header metadata — RTF carries an \info group with Author, Title, Subject, Operator (last modified by), Keywords, Edit time, and Revision count. Less comprehensive than DOCX but the basics are there.
  • Generator string. RTF files carry a \rtf1 version number and often an embedded generator hint (Microsoft Word, TextEdit, AbiWord, Pandoc, etc.). The exact control-code patterns are distinctive per generator.
  • Body text extraction. Linguistic signals run on the text after we strip control codes.
  • Embedded objects. RTF supports \object for OLE embeds; less common in student work but worth noting when present.
  • Code-page declaration. RTF specifies its code page explicitly. Mismatches between declared and actual encodings are diagnostically useful.

Signals not available on RTF

RTF is missing:

  • Tracked changes (technically supported but rare in practice; most editors don't expose them via RTF).
  • Hidden text in the OOXML sense (RTF has \v for hidden but most editors don't write it).
  • Embedded fonts (RTF references fonts by name only).
  • Comment threads (some implementations support, most don't preserve through round-trips).

Common false-positive paths

  • TextEdit-saved RTF (macOS default) has a distinctive generator string and lots of control-code overhead that can look unusual against the Word-RTF baseline.
  • Pandoc-generated RTF has a structural fingerprint that's consistent and identifiable but not suspicious — Pandoc is legitimate authoring tooling.
  • WordPerfect → RTF workflows have their own fingerprint.

What's distinctive about RTF review

RTF is the least-signal-rich format Autotend Forensics handles. If a student submits RTF and DOCX/PDF were options, the forensic signal floor is genuinely lower — fewer fields, less structural variation. The right response is often to request a different format for high-stakes review.

That said, RTF has one durable advantage: its plain-text nature means the body content is easy to read and confirm visually. Pasted blocks tend to leave obvious formatting control-code signatures.

What to expect

A typical authored RTF scan surfaces:

  • Header \info group as a flat metadata record.
  • Generator string identification.
  • Linguistic signals on extracted body text.
  • Per-control-code-block formatting inventory (useful for paste detection).

For low-stakes review RTF is fine. For high-stakes review, request the source format (DOCX, Pages, ODT) when possible.

Scan a RTF document now.

Free, browser-only, no signup. Autotend Forensics runs entirely in your browser.

Open Autotend Forensics →