Methodology · DOCX

DOCX forensics

Microsoft Word's OOXML container — the format Autotend Forensics surfaces the most signals on.

DOCX is where Autotend Forensics surfaces the most signals

Microsoft Word's OOXML container — the file extension .docx (plus its macro-enabled sibling .docm) — is the format most student submissions arrive in, and the one our detectors are calibrated against most heavily.

A DOCX file is a ZIP archive containing a word/document.xml body, a docProps/core.xml metadata record, a docProps/app.xml application record, and a handful of supporting parts (word/settings.xml, word/styles.xml, embedded fonts, media, themes, custom XML).

Signals available on DOCX

  • Metadata — full Creator / Last modified by / Created / Modified / TotalTime / Pages / Words / Revision / Application / AppVersion / Template inventory.
  • Edit history — tracked-changes residue, accept-all acceptance metadata, hidden-text blocks, comment threads (including resolved), revision marks per paragraph.
  • Paste detection — formatting-run inventory, language-tag flips, edit-time vs. word-count ratio.
  • Font / encoding — full font fallback chains, character-set reads, embedded font hashes.
  • AI signals — every linguistic signal we run, against the extracted body text.
  • Structural — full ZIP-entry inventory, export-source fingerprint (Microsoft Word vs. Google Docs export vs. LibreOffice vs. Pandoc), embedded-object path leaks, custom XML residue, compression-level anomalies.

Common false-positive paths on DOCX

  • Google Docs export → DOCX trips export-source mismatch unless the student's submission flow is officially via Docs.
  • iWork Pages → DOCX export sets Application = "Microsoft Word" (because Pages writes that into the export) while the ZIP shape says Pages. This is the most common confusion.
  • Old assignment templates with hidden printer instructions can trip the hidden-text detector. Familiarize yourself with your own templates' residue.

What to expect

A typical authored DOCX scan surfaces:

  • Metadata fields: ~25 fields populated, ~10 useful for review.
  • Edit-history flags: usually 0; 1+ requires inspection.
  • Font inventory: 2–4 families on a normal essay; 6+ suggests multi-source.
  • AI-signal flags: usually 0–2 stock-opener type flags on formal-register prose; treat these as low-confidence.

Pair the DOCX-specific signals with the document-level scorecard in Autotend Forensics for a full review.

Frequently asked

Is a high revision count a sign of effort?
Not directly. Revision count records the number of saves, not the amount of writing. A 50-revision essay may be a careful writer with autosave on; a 1-revision essay may be a paste-and-submit. Pair revision count with total edit time and word count for a useful read.
What's the difference between Application = Word and Application = WPS Office?
Different word processors produce DOCX files with slightly different structural fingerprints even when the Application field is the same. Kingsoft WPS Office, Polaris Office, OnlyOffice, and others write DOCX that Microsoft Word can open, but the ZIP shape is distinctive. Autotend Forensics surfaces the actual producer via structural fingerprint regardless of the Application claim.
If I see 'Accept all changes' residue, does that prove the student copy-pasted from an AI?
It proves the document went through a tracked-changes accept-all step at some point. That happens legitimately when a student accepts editor suggestions, Grammarly fixes, or feedback from a peer. The signal is stronger when paired with: low original edit time, accept-all timestamp shortly before submission, and a name on the accepted changes that isn't the student's.
Why do Google Docs DOCX exports look different from native Word DOCX?
Google Docs's DOCX exporter writes the ZIP file in a different order than Word, uses different default compression levels, and produces slightly different docProps fields. Autotend Forensics calls this 'export-source mismatch' when the Application field doesn't match the structural fingerprint. It's not necessarily suspicious — many students draft in Docs and submit in DOCX — but it's worth surfacing.

Scan a DOCX document now.

Free, browser-only, no signup. Autotend Forensics runs entirely in your browser.

Open Autotend Forensics →