How Hidden In Numbers Works
AI leads the analysis. The Hidden Signal Engine™ verifies the findings.
Large language models perform deep semantic analysis across 36 anomaly categories. The Hidden Signal Engine™ runs in parallel — applying deterministic rules, statistical validation, and linguistic analysis as a verification layer.
Two-Layer Architecture
Every document passes through two independent layers. AI discovers — the engine verifies.
Layer 1 — AI Semantic Analysis
Leads every scan · 36 anomaly categories · 3 analysis groups
Group A — Writing Intelligence
- Grammar & spelling anomalies
- Style and clarity issues
- Passive voice overuse
- Wordiness and redundancy
- Hedge language detection
Group B — Logic & Consistency
- Logical contradictions
- Scope overreach & authority claims
- Terminology drift
- Date & temporal inconsistencies
- Missing references
Group C — Completeness & Risk
- Numerical anomalies & outliers
- Repeated entities & bias signals
- Placeholder & draft text
- Structural gaps
- Document quality risk score
Layer 2 — Hidden Signal Engine™
Deterministic verification · 7 engines · Rules, statistics & linguistics
AI Semantic Analysis
Large language models lead every scan, performing deep semantic analysis that deterministic rules cannot replicate — understanding context, intent, and meaning across the full document.
Group A — Writing Intelligence
- Grammar & spelling anomalies
- Style and clarity issues
- Passive voice overuse
- Wordiness and redundancy
- Hedge language detection
Group B — Logic & Consistency
- Logical contradictions
- Scope overreach & authority claims
- Terminology drift
- Date & temporal inconsistencies
- Missing references
Group C — Completeness & Risk
- Numerical anomalies & outliers
- Repeated entities & bias signals
- Placeholder & draft text
- Structural gaps
- Document quality risk score
Verification Engine Deep Dives
Each of the 7 engines runs deterministically — producing auditable, reproducible results that verify and reinforce the AI findings.
Grammar Engine
14 rules, 0 guesswork
Verifies grammatical correctness using regex pattern matching, Penn Treebank POS tagging, and Hunspell spell checking. Confirms and extends AI grammar findings with deterministic rules.
How it works
- 1Tokenizes sentences using ParsedDocument from the parsing pipeline
- 2Applies 14 independent grammar rules — each isolated and auditable
- 3Uses compromise for tense detection and parallel structure analysis
- 4Uses wink-nlp POS tags (NNS/NNPS, VBZ/VBP) for subject-verb agreement verification
- 5Runs nspell dictionary lookup for spelling — skips proper nouns and acronyms
- 6Returns IntelligenceFinding[] with ruleId, original, suggested, and correction fields
Style Engine
8 rules + write-good integration
Verifies writing quality issues that weaken clarity — passive voice, wordiness, clichés, buzzwords, and nominalizations. Reinforces AI writing quality findings with deterministic patterns.
How it works
- 1Applies 7 custom style rules optimized for formal documents (reports, audits, contracts)
- 2Uses write-good with selective rule flags — adverb, so, thereIs, illusion
- 3Deduplicates write-good results against existing findings to avoid redundancy
- 4Passive voice detection uses auxiliary + irregular past participle pattern
- 5Wordiness: 25+ phrase-to-replacement mappings with exact correction suggestions
Readability Engine
Flesch-Kincaid + accurate syllables
Calculates Flesch Reading Ease and Flesch-Kincaid Grade Level using accurate syllable counting. Provides objective readability metrics that complement AI writing quality analysis.
How it works
- 1Counts syllables using the syllable package (handles edge cases hand-rolled versions miss)
- 2Flesch Reading Ease: 206.835 − 1.015×(words/sentences) − 84.6×(syllables/words)
- 3Flesch-Kincaid Grade: 0.39×(words/sentences) + 11.8×(syllables/words) − 15.59
- 4Complex word ratio: words with 3+ syllables ÷ total words (excludes capitalized words)
- 5Reading time: word count ÷ 238 (average adult reading speed)
- 6Grade level mapped to: Elementary / Middle School / High School / College / Graduate / Professional
Consistency Engine
Jaro-Winkler + TF-IDF analysis
Detects naming inconsistencies, unit conflicts, spelling variant mixing, and terminology drift using TF-IDF and Jaro-Winkler distance. Provides statistical verification for AI consistency findings.
How it works
- 1Name matching: groups by last name AND Jaro-Winkler similarity ≥ 0.88 across all name pairs
- 2Catches cross-format variants like 'Robert Johnson' vs 'Bob Johnson'
- 3TF-IDF identifies top-15 high-importance terms per document
- 4For each key term, checks if it appears in 3+ different capitalizations
- 5Unit inconsistency checks: distance (km/miles), weight (kg/lbs), temperature (°C/°F), data (MB/Mb)
- 6British/American spelling: 17 pair checks using word boundary regex
Logic Engine
Rule-based contradiction detection
Detects logical conflicts, temporal inconsistencies, missing references, and scope overreach using deterministic rule patterns. Provides verifiable evidence for AI contradiction findings.
How it works
- 1Contradiction detection: 20 polar-pair patterns (increase/decrease, success/failure, etc.)
- 2Same metric with multiple different values (metric phrase extraction + value comparison)
- 3Circular comparisons: A > B AND B > A — detected via regex over full document text
- 4Date conflict detection: temporal ordering validation across all extracted DateToken pairs
- 5Missing reference: anchor resolution — cross-references checked against heading/figure index
- 6Scope overreach: 8 absolute/superlative pattern classes with softening-qualifier exclusions
Numerical Intelligence Engine
Z-score + IQR statistical analysis
Statistical analysis of all number tokens using Z-score and IQR fence detection. Surfaces outliers, impossible values, and table summation errors with mathematical precision.
How it works
- 1Extracts all NumberToken[] from ParsedDocument (value, unit, context)
- 2Groups numbers by unit category (percentages, currency, counts, etc.)
- 3Z-score outlier: value more than 2.5 standard deviations from group mean
- 4IQR fence: value above Q3 + 1.5×IQR or below Q1 − 1.5×IQR
- 5Impossible percentage: value outside 0-100% range (excluding explicitly labeled growth rates)
- 6Table sum verification: column totals checked against stated totals with ±0.5% tolerance
Structure Engine
Heading inference + section validation
Infers document structure from paragraph characteristics and validates heading hierarchy, section completeness, and list formatting — providing structural context for AI findings.
How it works
- 1Heading inference: paragraphs of 2-10 words not ending with sentence punctuation
- 2Level detection: ALL CAPS → H1, numbered patterns → H1/H2/H3, word count → H2/H3
- 3Empty section: heading immediately followed by another heading (paraIndex diff = 1)
- 4Broken numbering: extracts numbered headings, checks for gaps in sequence
- 5List inconsistency: detects 3+ consecutive list items with mixed sentence/fragment endings
- 6Missing intro/conclusion: keyword scan of first/last 3 paragraphs for signal phrases
Document Health Score
Every Pro scan produces an 8-dimension health score (0-100). Each dimension is scored independently using a deduction formula: start at 100, deduct per finding based on severity × confidence.
| Dimension | Weight | Finding categories |
|---|---|---|
| Grammar | 20% | grammar, spelling |
| Logic | 20% | contradiction, missing-reference, authority-claims, scope-overreach |
| Readability | 15% | readability, reading-complexity |
| Consistency | 15% | consistency, terminology-drift, formatting-inconsistency |
| Structure | 15% | structure, broken-link, placeholder-text |
| Style | 10% | style, hedge-language |
| Numerical | 5% | numerical-anomaly |
| Overall | 100% | Weighted average of all dimensions |
Deduction formula: score = max(0, 100 − Σ(severity_weight × confidence)). Severity weights: critical 20 · high 12 · medium 6 · low 2.
See it in action
Upload a document and get a full AI intelligence report in seconds.
