Methodology · Metadata

Metadata signals

Author, timestamps, application, edit time, revision count — the fields written automatically every time a document is saved.

What document metadata is

Every modern document format — DOCX, PDF, PPTX, XLSX, ODT — writes a small structured record of "how this file was authored" every time you save it. That record is the metadata: who the author was, when the file was created, when it was last modified, how many minutes were spent editing, what application produced it, and a few dozen other automatically-set fields.

You don't have to do anything special to read it. Most editors expose at least a subset of these fields under File → Properties or Get Info. Autotend Forensics extracts the complete set directly from the file structure — no upload, no parsing heuristics, just the bytes the document already carries.

What the fields tell you

The fields most useful for academic-integrity review are:

  • Creator / Author — the user name registered in the editor when the file was first created. Stays as the original even after copies and exports.
  • Last modified by — the user name when the file was last saved. Diverges from Creator when the file was passed between people.
  • Created / Modified timestamps — when the document was first authored and most recently saved. Useful for spotting documents that were created moments before the assignment due time.
  • Edit time — total minutes (or seconds) the document was open in an editor while changes happened. Implausibly low edit time on a long document is a strong signal that text came from somewhere else.
  • Revision count / "Saves" — how many times the file was saved. A 2,000-word essay with a revision count of 1 means the author opened the document, typed (or pasted) everything, and hit save once.
  • Application + Version — which editor produced the file (Microsoft Word vs. LibreOffice vs. Google Docs export vs. iWork Pages). The combination can identify the exact build.
  • Template — Word records the template the document was derived from. Class assignment templates leave a residue here.

What metadata cannot tell you

Metadata is circumstantial evidence. It tells you what the editor recorded about the document's authoring session, not what actually happened. Common ways the signal misleads:

  1. Edits-in-place can erase suspicious fields. Saving the file under a new account, or running it through a "fix metadata" tool, resets Creator and Last modified by.
  2. Cloud-doc exports carry the export tool's signature, not the original author's. A Google Doc exported to DOCX shows "Microsoft Word" as the application because the Doc → DOCX converter on Google's side is writing that field.
  3. Format conversions lose data. PDF metadata is sparse; converting from DOCX → PDF drops most of the useful fields.
  4. Manual edits to metadata are technically possible. Most students won't bother, but anyone determined enough can.

For these reasons, Autotend Forensics never asserts "this document was AI-generated" or "this is plagiarized" from metadata alone. The metadata signals inform a conversation with the student. They are not a verdict.

What we surface

Autotend Forensics extracts the complete metadata record and flags fields whose combination is unusual — for example:

  • Low edit time relative to word count (a 2,000-word essay edited in 7 minutes).
  • Creator and Last modified by differ (file was authored on a different machine/account than the one that saved it).
  • Application is a known AI-content-export tool (e.g. ChatGPT's copy-to-DOCX path leaves a specific fingerprint).
  • Creation timestamp is suspiciously close to the assignment due time (within minutes of submission).

Each flag links back to the specific field and the value observed, so you can verify in your own editor before raising it.

Frequently asked

Can I trust document metadata as proof a student cheated?
No. Metadata is circumstantial: it tells you what the editor recorded, not what actually happened. Use it as a starting point for a conversation, paired with edit-history and paste-detection signals, not as a verdict on its own.
What does it mean when 'Creator' and 'Last modified by' are different?
The file was authored on one account and saved on another. Common benign reasons: editing across home and lab computers, OS user-name changes, shared family Word installs. Suspicious when paired with low edit time and a tight submission timestamp.
Why does my student's essay show Application = Microsoft Word when they say they used Pages?
Apple Pages writes 'Microsoft Word' into the Application field when it exports to DOCX, because the export format declares itself as Word. The structural ZIP shape still reveals Pages — Autotend Forensics catches this mismatch and surfaces it.
Can metadata be edited?
Yes — anyone with a hex editor or a metadata-stripping tool can modify fields. Most students will not bother. If metadata is suspiciously clean (no Creator, no application, no edit time) on a student submission, treat that itself as a signal.
How is edit time recorded?
Microsoft Word tracks total minutes the document was open and being modified. Time spent with the file open but idle, or open in a different application, isn't counted. Edit time is most useful in combination with revision count and word count — implausibly low numbers across all three is the strongest signal.

Scan a document for metadata now.

Free, browser-only, no signup. Autotend Forensics runs entirely in your browser.

Open Autotend Forensics →