Skip to content
Hidden In NumbersHidden In NumbersAI-Powered Document Intelligence
AI-Powered Document Intelligence

How Hidden In Numbers Works

AI leads the analysis. The Hidden Signal Engine™ verifies the findings.

Large language models perform deep semantic analysis across 36 anomaly categories. The Hidden Signal Engine™ runs in parallel — applying deterministic rules, statistical validation, and linguistic analysis as a verification layer.

Two-Layer Architecture

Every document passes through two independent layers. AI discovers — the engine verifies.

Layer 1 — AI Semantic Analysis

Leads every scan · 36 anomaly categories · 3 analysis groups

PRIMARY

Group A — Writing Intelligence

  • Grammar & spelling anomalies
  • Style and clarity issues
  • Passive voice overuse
  • Wordiness and redundancy
  • Hedge language detection

Group B — Logic & Consistency

  • Logical contradictions
  • Scope overreach & authority claims
  • Terminology drift
  • Date & temporal inconsistencies
  • Missing references

Group C — Completeness & Risk

  • Numerical anomalies & outliers
  • Repeated entities & bias signals
  • Placeholder & draft text
  • Structural gaps
  • Document quality risk score
verified by

Layer 2 — Hidden Signal Engine™

Deterministic verification · 7 engines · Rules, statistics & linguistics

VERIFIER
Grammar Engine
Style Engine
Readability Engine
Consistency Engine
Logic Engine
Numerical Intelligence Engine
Structure Engine
Document Health Score (8 dimensions, 0-100)

AI Semantic Analysis

Large language models lead every scan, performing deep semantic analysis that deterministic rules cannot replicate — understanding context, intent, and meaning across the full document.

Group A — Writing Intelligence

  • Grammar & spelling anomalies
  • Style and clarity issues
  • Passive voice overuse
  • Wordiness and redundancy
  • Hedge language detection

Group B — Logic & Consistency

  • Logical contradictions
  • Scope overreach & authority claims
  • Terminology drift
  • Date & temporal inconsistencies
  • Missing references

Group C — Completeness & Risk

  • Numerical anomalies & outliers
  • Repeated entities & bias signals
  • Placeholder & draft text
  • Structural gaps
  • Document quality risk score
Model pool: Llama 4 Maverick leads every group. Backup models rotate automatically on failure — analysis always completes, even under rate pressure.

Verification Engine Deep Dives

Each of the 7 engines runs deterministically — producing auditable, reproducible results that verify and reinforce the AI findings.

Grammar Engine

14 rules, 0 guesswork

Verifies grammatical correctness using regex pattern matching, Penn Treebank POS tagging, and Hunspell spell checking. Confirms and extends AI grammar findings with deterministic rules.

How it works

  • 1Tokenizes sentences using ParsedDocument from the parsing pipeline
  • 2Applies 14 independent grammar rules — each isolated and auditable
  • 3Uses compromise for tense detection and parallel structure analysis
  • 4Uses wink-nlp POS tags (NNS/NNPS, VBZ/VBP) for subject-verb agreement verification
  • 5Runs nspell dictionary lookup for spelling — skips proper nouns and acronyms
  • 6Returns IntelligenceFinding[] with ruleId, original, suggested, and correction fields

Style Engine

8 rules + write-good integration

Verifies writing quality issues that weaken clarity — passive voice, wordiness, clichés, buzzwords, and nominalizations. Reinforces AI writing quality findings with deterministic patterns.

How it works

  • 1Applies 7 custom style rules optimized for formal documents (reports, audits, contracts)
  • 2Uses write-good with selective rule flags — adverb, so, thereIs, illusion
  • 3Deduplicates write-good results against existing findings to avoid redundancy
  • 4Passive voice detection uses auxiliary + irregular past participle pattern
  • 5Wordiness: 25+ phrase-to-replacement mappings with exact correction suggestions

Readability Engine

Flesch-Kincaid + accurate syllables

Calculates Flesch Reading Ease and Flesch-Kincaid Grade Level using accurate syllable counting. Provides objective readability metrics that complement AI writing quality analysis.

How it works

  • 1Counts syllables using the syllable package (handles edge cases hand-rolled versions miss)
  • 2Flesch Reading Ease: 206.835 − 1.015×(words/sentences) − 84.6×(syllables/words)
  • 3Flesch-Kincaid Grade: 0.39×(words/sentences) + 11.8×(syllables/words) − 15.59
  • 4Complex word ratio: words with 3+ syllables ÷ total words (excludes capitalized words)
  • 5Reading time: word count ÷ 238 (average adult reading speed)
  • 6Grade level mapped to: Elementary / Middle School / High School / College / Graduate / Professional

Consistency Engine

Jaro-Winkler + TF-IDF analysis

Detects naming inconsistencies, unit conflicts, spelling variant mixing, and terminology drift using TF-IDF and Jaro-Winkler distance. Provides statistical verification for AI consistency findings.

How it works

  • 1Name matching: groups by last name AND Jaro-Winkler similarity ≥ 0.88 across all name pairs
  • 2Catches cross-format variants like 'Robert Johnson' vs 'Bob Johnson'
  • 3TF-IDF identifies top-15 high-importance terms per document
  • 4For each key term, checks if it appears in 3+ different capitalizations
  • 5Unit inconsistency checks: distance (km/miles), weight (kg/lbs), temperature (°C/°F), data (MB/Mb)
  • 6British/American spelling: 17 pair checks using word boundary regex

Logic Engine

Rule-based contradiction detection

Detects logical conflicts, temporal inconsistencies, missing references, and scope overreach using deterministic rule patterns. Provides verifiable evidence for AI contradiction findings.

How it works

  • 1Contradiction detection: 20 polar-pair patterns (increase/decrease, success/failure, etc.)
  • 2Same metric with multiple different values (metric phrase extraction + value comparison)
  • 3Circular comparisons: A > B AND B > A — detected via regex over full document text
  • 4Date conflict detection: temporal ordering validation across all extracted DateToken pairs
  • 5Missing reference: anchor resolution — cross-references checked against heading/figure index
  • 6Scope overreach: 8 absolute/superlative pattern classes with softening-qualifier exclusions

Numerical Intelligence Engine

Z-score + IQR statistical analysis

Statistical analysis of all number tokens using Z-score and IQR fence detection. Surfaces outliers, impossible values, and table summation errors with mathematical precision.

How it works

  • 1Extracts all NumberToken[] from ParsedDocument (value, unit, context)
  • 2Groups numbers by unit category (percentages, currency, counts, etc.)
  • 3Z-score outlier: value more than 2.5 standard deviations from group mean
  • 4IQR fence: value above Q3 + 1.5×IQR or below Q1 − 1.5×IQR
  • 5Impossible percentage: value outside 0-100% range (excluding explicitly labeled growth rates)
  • 6Table sum verification: column totals checked against stated totals with ±0.5% tolerance

Structure Engine

Heading inference + section validation

Infers document structure from paragraph characteristics and validates heading hierarchy, section completeness, and list formatting — providing structural context for AI findings.

How it works

  • 1Heading inference: paragraphs of 2-10 words not ending with sentence punctuation
  • 2Level detection: ALL CAPS → H1, numbered patterns → H1/H2/H3, word count → H2/H3
  • 3Empty section: heading immediately followed by another heading (paraIndex diff = 1)
  • 4Broken numbering: extracts numbered headings, checks for gaps in sequence
  • 5List inconsistency: detects 3+ consecutive list items with mixed sentence/fragment endings
  • 6Missing intro/conclusion: keyword scan of first/last 3 paragraphs for signal phrases

Document Health Score

Every Pro scan produces an 8-dimension health score (0-100). Each dimension is scored independently using a deduction formula: start at 100, deduct per finding based on severity × confidence.

DimensionWeightFinding categories
Grammar20%grammar, spelling
Logic20%contradiction, missing-reference, authority-claims, scope-overreach
Readability15%readability, reading-complexity
Consistency15%consistency, terminology-drift, formatting-inconsistency
Structure15%structure, broken-link, placeholder-text
Style10%style, hedge-language
Numerical5%numerical-anomaly
Overall100%Weighted average of all dimensions

Deduction formula: score = max(0, 100 − Σ(severity_weight × confidence)). Severity weights: critical 20 · high 12 · medium 6 · low 2.

See it in action

Upload a document and get a full AI intelligence report in seconds.

Hidden Signal Engine™ Online

Discover What Your Documents Are Hiding

AI-powered document intelligence that uncovers contradictions, anomalies, missing references, and hidden patterns — in seconds.