Methodology
What we look at, and why.
Autotend Forensics surfaces signals that may be consistent with paste-from-elsewhere, metadata anomalies, AI-assisted writing, or other authorship concerns. We never assert a verdict. Every signal we surface has a methodology page below — what it looks at, what it can detect, and what it commonly misreads.
Metadata
Author, timestamps, application, edit time, revision count — the fields written automatically every time a document is saved.
Reading the Application field in DOCX app.xml
The Application field in a docx tells you what program saved the file. It's almost always present, almost always reliable, and one of the simplest forensics signals to read. Here's what each common value means and what to make of an unexpected one.
5 min read
'Created' vs 'Modified' timestamps in a docx — what each one means
Word writes two timestamps into every docx — one for when the document was first created, one for when it was last saved. They look interchangeable. They aren't. Here's exactly how each is set, what they tell you together, and how to read them in context.
5 min read
PDF Producer field — a guide for instructors
Every PDF carries a Producer field declaring what program generated it. The value is one of the most useful single signals about how a PDF submission came together — whether it was exported from Word, generated by an LLM tool, printed to PDF, scanned, or re-saved through a converter. Here's what the common values mean.
6 min read
'Last Modified By' in a docx — what it means and what it doesn't
A docx file records who last saved it. This sounds like an answer to "who wrote this?" but the field has subtle behavior that makes it easy to misread. Here's exactly what it tracks, how it can get spoofed, and what it's actually useful for.
5 min read
The EditingDuration field in DOCX — what it actually measures
Word's "total editing time" field is one of the most-cited and most-misread numbers in document forensics. It measures something specific. It is not "how long the student worked on this." Here's what it does measure, what it doesn't, and how to read it in context.
6 min read
What your professor actually sees when they scan your essay
If your school uses a document forensics tool, your submitted file is being inspected — not just its content. Here's exactly what shows up, what each signal means, what raises flags, and how to check your own document before submitting.
8 min read
What docx metadata reveals about a document
A docx file carries dozens of hidden fields — author, edit time, application, revision history — that often tell a different story than the visible content. Here's what's in there, what each field means, and what it can't tell you.
7 min read
Edit history
Revision marks, tracked changes, hidden text, residue from accepted suggestions.
Paste detection
Large unbroken text blocks, missing typing tempo, mismatched formatting signatures.
AI-assisted writing signals
Patterns commonly found in AI-assisted writing — surfaced as signals, never asserted as verdicts.
Structural
ZIP-shape oddities, embedded objects, file-system path leaks, export-source fingerprints.
Excel formula vs cell-style consistency as a forensics signal
Two Excel files can have identical visible values and look interchangeable, but radically different formula histories underneath. A scan that looks at how the formulas relate to the cell styles can reveal whether a spreadsheet was actually built or just typed over the top of someone else's work.
5 min read
PowerPoint slide-master leaks — what they reveal about a deck's origin
A .pptx file carries far more origin information than the slides themselves. The slide master, theme, and embedded media all leak provenance. Here's what to look for when you're trying to figure out where a slide deck actually came from.
5 min read
When two students' essays share metadata fingerprints
Sometimes two student submissions look unrelated on the surface but share specific metadata fields — same creator name, same Application version, same revision-save IDs. Here's what each kind of shared fingerprint means and what to do about it.
6 min read
How to scan a PDF for tampering — an instructor's guide
Most academic-integrity tooling treats PDFs as black boxes. They're not. A PDF carries metadata fields, structural markers, and content-extraction signals you can read without specialized software. Here's a practical walkthrough.
6 min read
Why Google Docs exports look different in a forensics scan
A .docx downloaded from Google Docs is shaped differently from a .docx authored directly in Word. The differences are file-structural — different metadata fields, different XML, different edit-history shape — and they're not a sign of wrongdoing. Here's what to expect.
6 min read
Browse by signal category
Deep-dive landing pages for every detection signal Autotend Forensics surfaces. Each explains what the signal looks at, what it can detect, and where it commonly misreads.
Metadata
Author, timestamps, application, edit time, revision count — the fields written automatically every time a document is saved.
Edit history
Revision marks, tracked changes, hidden text, residue from accepted suggestions.
Paste detection
Large unbroken text blocks, missing typing tempo, mismatched formatting signatures.
Font & encoding
Character-set mismatches, font fallback patterns, embedded font hashes.
AI-assisted writing signals
Patterns commonly found in AI-assisted writing — surfaced as signals, never asserted as verdicts.
Structural
ZIP-shape oddities, embedded objects, file-system path leaks, export-source fingerprints.
Browse by file format
Format-specific forensic guides — what authorship signals each format carries, and where the format itself limits what review can surface.
DOCX
Microsoft Word's OOXML container — the format Autotend Forensics surfaces the most signals on.
PDF
PDF documents — fewer authoring signals than DOCX, but still useful for production-tool and edit-time signals.
PPTX
PowerPoint decks — slide-shape, template, and embedded-object signals.
XLSX
Excel spreadsheets — formula coverage, cell-style anomalies, and worksheet structure.
ODT
OpenDocument Text — LibreOffice/OpenOffice authoring signals.
Pages
Apple Pages — the iWork bundle and its export-to-DOCX trail.
RTF
Rich Text Format — older but still common; minimal metadata but distinctive container signals.