For courses that grade spreadsheets — finance, statistics, accounting, data-analysis — the visible cells don't tell you whether the student built the model or copied someone else's. Two .xlsx files can have identical visible values, identical layouts, and look interchangeable on the surface. The forensics signal is how the formulas and cell styles relate to each other. This page covers what to look at.
What's inside an xlsx
Like other Office formats, .xlsx is a ZIP archive. The relevant files:
xl/workbook.xml— the list of sheets in the workbook.xl/worksheets/sheet1.xml, ... — per-sheet content. Each<c>cell carries a formula (<f>), a value (<v>), and a style index.xl/styles.xml— the style table. Each cell points to a style index here; the styles file defines fonts, fills, borders, number formats.xl/sharedStrings.xml— common strings across the workbook.xl/calcChain.xml— Excel's calculation dependency graph (often present in workbooks with formulas).
The relationships between these files carry the forensics signal.
The "typed-over" pattern
When a student copies someone else's spreadsheet and then types over the cell values without re-entering the formulas, the result has a characteristic shape:
- Formulas removed from cells that originally had them. The
<f>elements are gone but the<v>(value) elements remain — the visible numbers look right, but the structural backbone is hollow. - Cell styles unchanged. The student didn't reformat the cells; the styles still match whoever made the original.
- calcChain.xml may still reference cells that no longer have formulas. Excel writes calcChain at calculation time; if the file hasn't been recalculated since the typing-over, the calcChain points at non-existent formulas.
A scanner can detect the discrepancy between formula presence and style sophistication. A cell with a sophisticated multi-step style (custom number format, conditional formatting reference, named-range source) but a typed-in literal value where a formula would be expected is a red flag.
The "built-from-template" pattern
Students who legitimately use a template the instructor provided show a different signature:
- Cell styles match the template (expected; the template defined them).
- Formulas appear in the cells the template scaffolded (also expected; the template's formulas weren't touched).
- Cell values vary across submissions (the student plugged their own data in).
- calcChain.xml is consistent with the formula set (Excel recalculated cleanly after the student's edits).
This pattern is benign. Many courses provide spreadsheet templates and that's the expected workflow.
The "built-from-scratch" pattern
The strongest authenticity signal for spreadsheet work:
- Cell styles are simple or default. Most students don't customize Excel formatting heavily; default font, basic number formats, occasional bold for headers.
- Formulas appear where they should. Sums, averages, lookups — present, not hollowed out.
- Formulas reference local sheet cells (e.g.,
=SUM(B2:B20)), not external workbooks or named ranges defined elsewhere. - No
calcChain.xmlmismatches.
A spreadsheet that looks simple but works correctly is consistent with a student who built it themselves.
What scanning xlsx can surface
The forensics scoring engine has dedicated xlsx detectors. The most useful:
xlsx-formulas detector
Counts formulas vs values across sheets. Flags cells where:
- A complex style is applied but the cell value is a plain literal (no formula).
- A formula's expected output doesn't match the displayed value (rare; suggests manual override).
- Formula density drops sharply across worksheets that should have similar structure.
xlsx-cell-style detector
Compares the style-table sophistication against the formula sophistication. A workbook with 50+ custom styles defined in styles.xml but only 3 formulas across the whole workbook is suspicious — it looks like a fancy template that someone hollowed out.
Embedded VBA / macros
A separate signal entirely. If the workbook has embedded macros (.xlsm-like content in an .xlsx file), or references to xl/vbaProject.bin, that's a significant forensics surface. Macros can be benign (template-provided) or suspicious (downloaded from elsewhere). Worth surfacing either way.
What this can't tell you
- Whether the analysis is correct. A spreadsheet can pass every formula-consistency check and still produce wrong answers. Forensics is silent on math correctness.
- Whether the student understands their formulas. A perfectly-built spreadsheet doesn't prove understanding.
- Whether someone helped the student. Spreadsheet help is hard to detect because cell-style fingerprints depend on the helper's machine — but a quick "did anyone walk you through this?" usually surfaces it without forensics.
Common false positives
- Instructor-provided template. The most common false positive on the typed-over pattern. If the instructor scaffolded the spreadsheet with formulas, hollowing some out is the assignment (the student is supposed to enter their own values where formulas would compute from input).
- Excel auto-conversion. When a
.xlsxis opened in Numbers (macOS), edited, and re-exported, some formulas can get converted to values. Not the student's fault. - Copy-paste-values-only. A student who computed something in one workbook and pasted the values into another can leave that pattern. Usually with reasonable explanation.
What to do when you see it
- Compare against the assignment expectations. Did you provide a template? Were students expected to add formulas, or just values?
- Pull the per-paper rolling baseline. If everyone in the class shows the same hollowed-out pattern (because that's what the assignment was), it's not suspicious. If one student stands out, that's a different story.
- Ask the student to walk through one cell's logic. "How did you get the value in B25?" surfaces understanding (or its absence) better than forensics ever will.
What Autotend Forensics surfaces
For xlsx submissions, the report walks each sheet and surfaces:
- Per-sheet formula density vs cell-style sophistication.
- calcChain mismatches.
- Embedded macros, if any.
- Cross-submission style-table overlap (shared template detection).
All as observations. For the full xlsx methodology, see the xlsx format page.