Paste detection signals — Autotend Forensics Methodology

Large unbroken text blocks, missing typing tempo, mismatched formatting signatures.

What paste detection looks at

Original typing leaves a particular shape in the document: sentences are written in roughly the order they appear, edits happen at the cursor, formatting accumulates incrementally as the author selects words and applies bold/italic. Pasted text arrives all at once, often with the source's formatting attached.

Paste-detection signals look for the difference between these two shapes.

What the signals are

Long unbroken text runs. A typing author produces lots of short edits — a word here, a fix there, a paragraph break. Pasted text shows up as a single uninterrupted run of identical formatting (same font, same size, same color, same language tag). Autotend Forensics flags runs above a threshold (configurable; defaults to ~500 characters of identical formatting).
Mismatched formatting signatures. Documents authored in Word vs. Google Docs vs. ChatGPT-copy-paste write subtle differences into the formatting metadata. A document with predominantly Word formatting that contains a 2,000-character block in Google-Docs-export style is doing two-source work.
Missing typing tempo. Some platforms record per-word edit timestamps. Combined with total edit time, a 1,500-word essay written in 4 minutes of recorded edit time cannot have been typed — even a very fast typist needs ~10x that.
Language-tag flips. Pasted text sometimes carries a different language tag than the rest of the document (xml:lang="en-US" on the surrounding paragraphs, en-GB on the pasted block, because the source document was British English).

What paste detection cannot tell you

Pasting is legitimate for quoted material, references, and data tables. Every signal here has a benign explanation:

A student who writes their essay in Notes / Notion / Bear and pastes into Word at the end will trigger the long-run flag for the entire essay.
Citation blocks pasted from a reference manager are paste-detected.
Code samples in a CS assignment are by definition long unbroken runs.

Use the signal as a starting point for a conversation: "This block looks like it was authored elsewhere. Where did the draft come from?" That question is often more productive than the detection itself.

What we surface

Autotend Forensics highlights:

The longest unbroken formatting run in the document, with its character count and the formatting fingerprint.
The count of distinct formatting signatures present in the body (1 is unusual; 10+ suggests multi-source work).
Edit-time vs. word-count ratio (words-per-minute equivalent).
Language-tag inventory (every distinct xml:lang value found in the document).

Pair these with edit-history and metadata signals to triangulate before raising an integrity concern.