Large unbroken text blocks, missing typing tempo, mismatched formatting signatures.
What paste detection looks at
Original typing leaves a particular shape in the document: sentences are written in roughly the order they appear, edits happen at the cursor, formatting accumulates incrementally as the author selects words and applies bold/italic. Pasted text arrives all at once, often with the source's formatting attached.
Paste-detection signals look for the difference between these two shapes.
What the signals are
- Long unbroken text runs. A typing author produces lots of short edits — a word here, a fix there, a paragraph break. Pasted text shows up as a single uninterrupted run of identical formatting (same font, same size, same color, same language tag). Autotend Forensics flags runs above a threshold (configurable; defaults to ~500 characters of identical formatting).
- Mismatched formatting signatures. Documents authored in Word vs. Google Docs vs. ChatGPT-copy-paste write subtle differences into the formatting metadata. A document with predominantly Word formatting that contains a 2,000-character block in Google-Docs-export style is doing two-source work.
- Missing typing tempo. Some platforms record per-word edit timestamps. Combined with total edit time, a 1,500-word essay written in 4 minutes of recorded edit time cannot have been typed — even a very fast typist needs ~10x that.
- Language-tag flips. Pasted text sometimes carries a
different language tag than the rest of the document
(
xml:lang="en-US"on the surrounding paragraphs,en-GBon the pasted block, because the source document was British English).
What paste detection cannot tell you
Pasting is legitimate for quoted material, references, and data tables. Every signal here has a benign explanation:
- A student who writes their essay in Notes / Notion / Bear and pastes into Word at the end will trigger the long-run flag for the entire essay.
- Citation blocks pasted from a reference manager are paste-detected.
- Code samples in a CS assignment are by definition long unbroken runs.
Use the signal as a starting point for a conversation: "This block looks like it was authored elsewhere. Where did the draft come from?" That question is often more productive than the detection itself.
What we surface
Autotend Forensics highlights:
- The longest unbroken formatting run in the document, with its character count and the formatting fingerprint.
- The count of distinct formatting signatures present in the body (1 is unusual; 10+ suggests multi-source work).
- Edit-time vs. word-count ratio (words-per-minute equivalent).
- Language-tag inventory (every distinct
xml:langvalue found in the document).
Pair these with edit-history and metadata signals to triangulate before raising an integrity concern.
Scan a document for paste detection now.
Free, browser-only, no signup. Autotend Forensics runs entirely in your browser.
Open Autotend Forensics →