Rich Text Format — older but still common; minimal metadata but distinctive container signals.
RTF forensics
Rich Text Format — .rtf — is an older Microsoft document
format that predates DOCX by a decade. It's a plain-text
container with formatting represented as control codes (\b
for bold, \par for paragraph break, etc.).
RTF still appears as a submission format from:
- Older word processors that default to RTF (some legacy WordPerfect, AbiWord, or specific TextEdit configurations).
- Cross-platform conversion workflows where DOCX wasn't an option.
- Specific accessibility tools that prefer RTF as a portable intermediate.
Signals available on RTF
- Header metadata — RTF carries an
\infogroup with Author, Title, Subject, Operator (last modified by), Keywords, Edit time, and Revision count. Less comprehensive than DOCX but the basics are there. - Generator string. RTF files carry a
\rtf1version number and often an embedded generator hint (Microsoft Word, TextEdit, AbiWord, Pandoc, etc.). The exact control-code patterns are distinctive per generator. - Body text extraction. Linguistic signals run on the text after we strip control codes.
- Embedded objects. RTF supports
\objectfor OLE embeds; less common in student work but worth noting when present. - Code-page declaration. RTF specifies its code page explicitly. Mismatches between declared and actual encodings are diagnostically useful.
Signals not available on RTF
RTF is missing:
- Tracked changes (technically supported but rare in practice; most editors don't expose them via RTF).
- Hidden text in the OOXML sense (RTF has
\vfor hidden but most editors don't write it). - Embedded fonts (RTF references fonts by name only).
- Comment threads (some implementations support, most don't preserve through round-trips).
Common false-positive paths
- TextEdit-saved RTF (macOS default) has a distinctive generator string and lots of control-code overhead that can look unusual against the Word-RTF baseline.
- Pandoc-generated RTF has a structural fingerprint that's consistent and identifiable but not suspicious — Pandoc is legitimate authoring tooling.
- WordPerfect → RTF workflows have their own fingerprint.
What's distinctive about RTF review
RTF is the least-signal-rich format Autotend Forensics handles. If a student submits RTF and DOCX/PDF were options, the forensic signal floor is genuinely lower — fewer fields, less structural variation. The right response is often to request a different format for high-stakes review.
That said, RTF has one durable advantage: its plain-text nature means the body content is easy to read and confirm visually. Pasted blocks tend to leave obvious formatting control-code signatures.
What to expect
A typical authored RTF scan surfaces:
- Header
\infogroup as a flat metadata record. - Generator string identification.
- Linguistic signals on extracted body text.
- Per-control-code-block formatting inventory (useful for paste detection).
For low-stakes review RTF is fine. For high-stakes review, request the source format (DOCX, Pages, ODT) when possible.
Scan a RTF document now.
Free, browser-only, no signup. Autotend Forensics runs entirely in your browser.
Open Autotend Forensics →