| Include | Exclude | Reason |
|---|---|---|
| Timed-math probes | Number-line tasks | The construct is timed-math effective speed, not general platform speed. |
| Valid/scorable rows | Invalid/non-scorable rows | The RT likelihood should not mix valid work with nonvalid mechanisms. |
| Positive response time | Zero-RT QC rows | Zero RT is a QC edge case, not a response-time observation. |
| Rounded Janison RT treated as interval-censored | STPM/STDD speed anchors and legacy Year 1 Term 3 ASDD_2025/ASDD | These rows have different constructs or known modality/scoring risks. |
Response Time Modelling
Raw speed is not safe as an achievement adjustment — but guarded accuracy–speed profiles may be, once evidence gates are cleared
1 Summary
The screener records response times (RTs), but RTs are not automatically interpretable as mathematical fluency or effort. A response time can reflect item difficulty, interface modality, rounding/capping, whether the item was reached, whether the final response was scorable, administration conditions, and student strategy. This page explains the clean RT contract and the path from speed models to possible response-process profiles.
Speed is not ready to change live bands, but it is not being abandoned. The current evidence supports a guarded path: clean the response-time data contract, estimate effective speed only where timing is interpretable, protect slow-accurate students, audit rapid-risk patterns, and test whether joint accuracy–RT models can support stable response-process profiles. The active joint path is J0/J1/J2, now split conceptually into legacy J2a and planned J2b. The full run shows that J0/J1 are viable research models and that J2 Foundation is interpretable as a legacy rapid-response sensitivity result, but J2 Year 1 failed diagnostics and the global RT <= 2s rapid rule is now legacy/provisional only.
2 RT data contract — what is included and why
The clean speed likelihood uses a deliberately narrow response-time contract.
Response time is modelled as a rounded / interval-censored lognormal process using lower and upper RT bounds supplied to the Stan data. A larger latent speed parameter means faster effective speed. Achievement remains separate unless a joint model is explicitly being tested. See Model specifications for the full formulae.
The Janison rounding convention still needs final platform confirmation before the interval-censoring interpretation is treated as final. If recorded seconds are floored, ceiled, rounded to nearest second, or exact elapsed integers, the correct interval differs.
The latest speed/joint evidence is grounded mainly in pooled Term 3/4 person-administration data for the clean timed-math families. It is not a full Term 1/3/4 longitudinal speed-profile model. Term 1 evidence remains part of the broader project history, but temporal generalisation of speed profiles is still a future gate.
3 Speed-only model (pooled): T1b
The pooled speed model estimates effective speed from timed-math, valid/scorable, positive-RT rows after excluding legacy/problematic forms. It is under active validation only and must not influence live bands.
| Year | cor(T1 FE speed, Stan speed) | cor(Stan speed, current achievement) | Interpretation |
|---|---|---|---|
| Foundation | 0.923 | 0.125 | Stable speed signal; weak relation to achievement. |
| Year 1 | 0.928 | 0.189 | Stable speed signal; modest relation to achievement. |
The weak-to-modest relationship with achievement is descriptive only. Observed speed-achievement correlations should not be used to infer a cognitive speed–ability relation unless item time-intensity, item difficulty, person sampling, and form design are accounted for. This argues against folding raw speed directly into the achievement score, but supports further response-process validation.
4 Speed-only model (family-specific): T1c
Family-specific speed models test whether one pooled speed dimension is enough. They show related but not identical speed signals across probe families. T1c is speed-only: family speed dimensions are not jointly modelled with achievement in this branch; the joint family-speed model enters at J1/J2.
| Year | Family | N | cor(pooled speed) | cor(current achievement) | cor(valid rate) | cor(trailing rate) |
|---|---|---|---|---|---|---|
| Foundation | Match quantity | 2525 | 0.755 | 0.157 | 0.511 | -0.632 |
| Foundation | Missing number | 2491 | 0.859 | 0.124 | 0.548 | -0.664 |
| Year 1 | Arithmetic | 1366 | 0.862 | 0.255 | 0.714 | -0.786 |
| Year 1 | Missing number | 2558 | 0.926 | 0.190 | 0.621 | -0.718 |
Family-speed estimates are related but not interchangeable:
| Year | Family comparison | Person-speed correlation | Interpretation |
|---|---|---|---|
| Foundation | Missing number vs match quantity | 0.433 | Moderate relation: one pooled speed factor is probably too crude for these families. |
| Year 1 | Arithmetic vs missing number | 0.566 | Moderate relation: arithmetic and missing-number speed share signal but are not identical. |
These results support broad family-specific speed factors, not a separate speed factor for every small probe or item.
The strong association between estimated speed and valid/trailing row rates shows that the speed signal is not pure automaticity. It also reflects opportunity, completion, item-row dynamics, and possibly administration or modality effects. This is another reason speed should remain under validation and should not be used as a direct achievement adjustment.

5 Why family-specific speed, not family-specific achievement?
The current evidence supports a broad achievement score with probe/testlet effects, but allows speed to vary by task family. These are not contradictory claims.
Accuracy asks whether the student can solve the item. Response time asks how quickly the student produces a valid response under a specific task process and interface. Correctness across tasks can be driven by a broad numeracy achievement dimension, while time taken can vary more by format, motor demand, response modality, visual search, and strategy.
The Ability Structure Review supports one broad achievement score with local probe effects rather than separate operational achievement subscores. Family-specific achievement scores would need stronger evidence: enough items per family, reliable subscore precision, external validation, stability over time, fairness checks, and clear instructional interpretation.
Family-specific speed is safer to treat as response-process context. The current structure is therefore:
Achievement: one broad theta + item/probe/testlet effects
Speed: broad task-family speed factors for eligible timed-math families
Rapid-risk: separate rapid-response sensitivity/audit signal
A4: separate response-state/context layer outside achievement scoring
6 Which task families are covered?
The current clean speed and joint runs do not estimate a unique speed factor for every task. They cover broad eligible timed-math families with enough clean RT evidence.
| Task family / probe type | Current speed treatment | Reason |
|---|---|---|
| Missing number | Included as timed-math speed family where eligible. | Timed task process with clean valid/scorable positive-RT rows. |
| Match quantity | Included as timed-math speed family where eligible. | Timed task process with clean valid/scorable positive-RT rows. |
| Arithmetic | Included for Year 1 arithmetic where eligible and not affected by legacy modality exclusions. | Timed task process, but modality/form caveats matter. |
| Number line | Excluded from main speed factor; treat RT as separate number-line response-process/context evidence if used. | Untimed/spatial estimation; time may reflect deliberation or clicking strategy rather than fluency. |
| Decomposition / DMT | Not included in the current clean joint speed evidence unless timing/data-contract checks later justify it. | Requires timing-contract, RT-distribution, and construct checks before inclusion. |
| Magnitude comparison | Not included in the current clean joint speed evidence unless timing/data-contract checks later justify it. | Requires timing-contract, RT-distribution, and construct checks before inclusion. |
Untimed tasks can still have useful RT/context information, but that evidence should not be folded into the timed-math speed factor. For number line in particular, a better future construct is a separate response-style audit such as rapid-clicking versus deliberate estimation, not general effective speed.
7 Rapid-risk and slow-accurate audit cohorts
The descriptive profile flags are audit cohorts, not interventions or classifications. They test whether accuracy and RT together can separate qualitatively different response-process patterns.
Threshold definitions used for these audit cohorts should be treated as provisional. In particular, RT <= 2s is a legacy comparator, not an operational rapid-risk definition.
| Flag | Accuracy condition | Speed / RT condition | Completion condition | Minimum evidence |
|---|---|---|---|---|
| Slow-accurate | High reached accuracy / upper achievement group | Slow speed / lower speed quintile | Sufficient valid RT evidence | Pending documentation |
| Fast-low-accuracy | Low reached accuracy / lower achievement group | Fast speed / upper speed quintile | Sufficient valid RT evidence | Pending documentation |
| Rapid-risk | Low accuracy or suspicious rapid pattern | Elevated rapid-response rate; legacy examples used RT <= 2s rows | Sufficient valid RT evidence | Pending documentation |
| High-trailing high-reached-accuracy | High reached accuracy | Not primarily speed-defined | High trailing proxy rate | Pending documentation |
| Year | Family | N person-admins | Slow-accurate | Fast-low-accuracy | Rapid-risk | High-trailing high-reached-accuracy |
|---|---|---|---|---|---|---|
| Foundation | Missing number | 2491 | 190 | 205 | 479 | 215 |
| Foundation | Match quantity | 2525 | 157 | 237 | 184 | 156 |
| Year 1 | Missing number | 2558 | 292 | 173 | 230 | 298 |
| Year 1 | Arithmetic | 1366 | 80 | 86 | 105 | 91 |
7.1 Accuracy-speed surface

The key point: some high-achievement groups are not the fastest, and some rapid groups need audit rather than reward. A one-dimensional speed rule would misclassify both.
8 Slow-accurate protection
The key operational principle is: speed should not mean faster is better. Slow-accurate students may be demonstrating the target achievement with a slower response process. Any profile system must protect them from being penalised for speed alone.
9 Joint accuracy–RT models: J0/J1/J2
The joint accuracy–response-time family (J0, J1, J2) tests whether achievement and speed should be estimated together. The full J0/J1/J2 run is now available and has been reviewed at the aggregate diagnostic level.
| Model | Purpose | Diagnostic result | Current status | Can it change live bands now? |
|---|---|---|---|---|
| J0 | Baseline joint accuracy–RT: one achievement and one speed dimension, bivariate normal. | Basic two-chain diagnostics acceptable in Foundation and Year 1: 0 divergences; max treedepth 6; min E-BFMI 0.596; scalar max R-hat 1.015. | Research summary available; not decision evidence. | No |
| J1 | Extends J0 with family-specific speed dimensions. | Basic two-chain diagnostics acceptable in Foundation and Year 1: 0 divergences; max treedepth 6; min E-BFMI 0.676; scalar max R-hat 1.011. | Research summary available; not decision evidence. | No |
| J2 | Extends J1 with a rapid-response indicator in the accuracy model. | Mixed: Foundation acceptable, but Year 1 failed diagnostics (544 divergences in one chain, E-BFMI 0.029, scalar max R-hat 12.39). | Not ready; Year 1 requires reparameterisation or model revision. | No |
Run ID: m3-joint-accuracy-rt-j0j1j2-full-20260525T015526Z.
Joint models must beat simpler models on decision-relevant evidence. They do not win by being more complex or more theoretically complete. The current run supports research interpretation of J0/J1 only; it does not clear any joint model for reporting or live bands.
Immediate next modelling step is not to promote old J2. The old J2 is now treated as J2a / legacy sensitivity only because its rapid flag is the global RT <= 2s rule. A J2 Year 1 rerun is only a comparability option, not the main promotion gate. The preferred next candidate is J2b, using family/item rapid-threshold flags within the clean timed families. Residual-RT conditional accuracy is reserved for a later J3 candidate.
9.1 What the full run says so far
| Model | Year | N person-admins | cor(theta, A2) | cor(speed, median RT) | cor(speed, rapid rate) | cor(speed, accuracy) | Interpretation status |
|---|---|---|---|---|---|---|---|
| J0 | Foundation | 2537 | 0.595 | -0.804 | 0.722 | -0.205 | Research-only |
| J0 | Year 1 | 2559 | 0.604 | -0.757 | 0.758 | -0.090 | Research-only |
| J1 | Foundation | 2537 | 0.600 | -0.792 | 0.676 | -0.166 | Research-only |
| J1 | Year 1 | 2559 | 0.607 | -0.757 | 0.736 | -0.054 | Research-only |
| J2 | Foundation | 2537 | 0.606 | -0.793 | 0.674 | -0.159 | Foundation only; caution |
| J2 | Year 1 | 2559 | 0.607 | -0.747 | 0.720 | -0.020 | Not interpretable: failed diagnostics |
The J0/J1 speed dimensions behave like response-time process estimates: they correlate strongly with median RT and rapid-response rate, but only weakly with accuracy. This supports the current governance position: speed is response-process evidence, not an achievement adjustment. The moderate J0/J1 correlations with A2 achievement (about 0.60) also show that the joint achievement scale is not simply a drop-in replacement for A2; disagreements would need separate validation before any reporting use.
For J2a, the Foundation rapid-response coefficient is strongly negative (about -1.04) under acceptable diagnostics, supporting a rapid-response sensitivity signal for that slice. The Year 1 rapid-response coefficient is also negative in the scalar summaries, but the Year 1 fit failed diagnostics and should not be interpreted. Treat old J2/J2a as legacy sensitivity evidence only; it does not define operational rapid-risk.
10 What is established, and what is not
| Established so far | Not established yet |
|---|---|
| Joint accuracy–speed modelling is technically viable for J0/J1 and for J2 Foundation. | J2 Year 1 is not trustworthy, and old J2/J2a is not the preferred next model because its rapid rule is global RT <= 2s. |
| Achievement and speed are related but distinct dimensions. | Speed should not change live achievement bands. |
| The speed estimate behaves as expected: higher speed corresponds to shorter median RT and higher rapid-response rate. | External validity and classification utility of speed profiles beyond A2 are not yet established. |
| Family-specific speed is justified as response-process context for eligible timed-math families. | Operational cut rules for profiles are not final. |
| Legacy J2a Foundation shows a strong negative rapid-response effect, but the global RT <= 2s rule is provisional only. | Fairness and safety across cohorts, schools/classes, devices/admin contexts, and subgroups are not yet established. |
| Slow-accurate students exist and require protection from speed penalties. | Temporal generalisation from Term 3/4 to Term 1/3/4 is not yet established. |
The operational direction is therefore: keep achievement as the separate scoring construct, keep A2 as the IRT achievement reference for model comparisons, and develop guarded accuracy–speed response-process profiles around it. M0 remains the raw-score comparison benchmark for live reporting.
11 J2b rapid-threshold decision
The current decision is to keep old J2 as J2a / legacy sensitivity evidence and use J2b as the preferred next candidate if this branch continues. J2b remains shadow_only_not_for_live_bands. It would use only the current clean timed families:
- Foundation: missing number and match quantity;
- Year 1: missing number and arithmetic under the existing clean data contract.
Magnitude comparison is excluded/deferred because it is a binary-choice special case with interval/discrete RT behaviour; ultra-fast responses can be accurate, and side-bias/distance-effect metadata are not yet available. Number line, decomposition, STPM/picture match, and STDD remain collateral/context evidence, not timed-speed-factor inputs.
The planned primary J2b flag is:
rapid_j2b = rt_sec <= min(item_p05_rt, family_p05_rt)
computed within the clean valid/scorable positive-RT timed-maths contract. Accuracy-collapse evidence is used to validate or challenge the flag; it does not define the flag. Residual-log-RT thresholding is reserved for a later J3 candidate.
12 Model specifications
This page focuses on speed and joint accuracy–RT specifications. The A1/A2/A3 achievement-side specifications belong in the State of Play and Ability Structure Review. The joint models below use an A2-style achievement component, but the formulae here focus on the RT and joint-model additions.
12.1 T1b — Speed-only model (pooled)
Response time \(T_{pi}\) is represented in Stan by lower and upper bounds \((\ell_{pi}, u_{pi})\) and modelled as interval-censored. Current runs used the rounded-RT data contract, but the exact Janison convention still needs final platform verification before interpreting the bounds as floor, ceiling, nearest-second rounding, or exact elapsed integer seconds.
\[\Pr(\ell_{pi} < T_{pi} \le u_{pi}) = \Phi_{\text{LN}}\!\left(u_{pi};\, \mu_{pi},\, \sigma\right) - \Phi_{\text{LN}}\!\left(\ell_{pi};\, \mu_{pi},\, \sigma\right)\]
where \(\Phi_{\text{LN}}\) is the lognormal CDF and the location parameter is:
\[\mu_{pi} = \beta_0 + \beta_i - \text{speed}_p, \quad \text{speed}_p = \sigma_{\text{speed}} \cdot z_p\]
A larger \(\text{speed}_p\) shifts the RT distribution left — meaning faster expected response. \(\beta_i\) is an item time-intensity effect (mean-centred); larger values imply longer expected response time. \(\beta_0\) is the overall intercept.
Priors: \(z_p \sim \mathcal{N}(0,1)\); \(\;\beta_0 \sim \mathcal{N}(\log 5,\, 1)\); \(\;\sigma \sim \text{Exp}(1)\) with lower bound 0.05; \(\;\sigma_{\text{speed}},\, \sigma_{\text{item}} \sim \text{Exp}(1)\) with lower bound 0.01.
12.2 T1c — Speed-only model (family-specific)
As T1b, but with a separate speed factor \(\tau_{pf}\) for each probe family \(f\):
\[\mu_{pif} = \beta_0 + \beta_i - \tau_{pf}\]
Each family has its own residual SD \(\sigma_f\). The person speed vector \((\tau_{p1}, \dots, \tau_{pF})\) is not jointly modelled with achievement in T1c; joint modelling enters in J0/J1/J2.
12.3 J0 — Joint accuracy–speed (baseline)
Achievement \(\theta_p\) and speed \(\tau_p\) are drawn from a bivariate normal via the Stan non-centred parameterisation:
\[\begin{pmatrix}\theta_p \\ \tau_p\end{pmatrix} = \text{diag}(\boldsymbol{\sigma}_{\text{person}})\,\mathbf{L}_\Omega\,\mathbf{z}_p, \quad \mathbf{z}_p \sim \mathcal{N}_2(\mathbf{0},\, \mathbf{I})\]
\[\mathbf{L}_\Omega \sim \text{LKJ-Cholesky}(2), \quad \boldsymbol{\sigma}_{\text{person}} \sim \text{Exponential}(1)\]
This matches the Stan code using diag_pre_multiply(sigma_person, L_Omega).
The accuracy likelihood is:
\[\Pr(Y_{pi} = 1) = \text{logit}^{-1}(b_0 + \theta_p - b_i)\]
The RT likelihood is the same interval-censored lognormal as T1b, with \(\mu_{pi} = \beta_0 + \beta_i - \tau_p\) and a single residual SD \(\sigma_{\text{RT}}\).
Item effects are mean-centred: \(b_i = \sigma_b \cdot (\tilde{b}_i - \bar{\tilde{b}})\); \(\;\beta_i = \sigma_\beta \cdot (\tilde{\beta}_i - \bar{\tilde{\beta}})\).
The posterior correlation \(\rho(\theta, \tau)\) is reported via the generated quantities block.
12.4 J1 — Joint accuracy–family speed
Extends J0 to \(F\) probe families. The person parameter vector is \((F+1)\)-dimensional:
\[(\theta_p,\; \tau_{p1},\; \dots,\; \tau_{pF})^\top \;\sim\; \mathcal{N}_{F+1}\!\left(\mathbf{0},\; \boldsymbol{\Sigma}\right)\]
\[\boldsymbol{\Sigma} = \text{diag}(\boldsymbol{\sigma})\,\boldsymbol{\Omega}\,\text{diag}(\boldsymbol{\sigma}), \quad \boldsymbol{\Omega}=\mathbf{L}_\Omega\mathbf{L}_\Omega^\top, \quad \mathbf{L}_\Omega \sim \text{LKJ-Cholesky}(2)\]
The non-centred draw is \(\mathbf{x}_p = \text{diag}(\boldsymbol{\sigma})\mathbf{L}_\Omega\mathbf{z}_p\); the Stan code implements the equivalent row-vector form with diag_pre_multiply(sigma_person, L_Omega).
The RT likelihood for item \(i\) in family \(f\) uses the family-specific speed \(\tau_{pf}\) and family-specific residual \(\sigma_f\):
\[\mu_{pif} = \beta_0 + \beta_i - \tau_{pf}\]
The accuracy likelihood is identical to J0.
12.5 J2a — Legacy joint accuracy–family speed + rapid-response check
Extends J1 with a binary rapid-response indicator \(r_{pi} \in \{0, 1\}\) in the accuracy model:
\[\Pr(Y_{pi} = 1) = \text{logit}^{-1}(b_0 + \theta_p - b_i + \gamma_{\text{rapid}} \cdot r_{pi})\]
\[\gamma_{\text{rapid}} \sim \mathcal{N}(0,\, 1)\]
The parameter \(\gamma_{\text{rapid}}\) tests whether accuracy is systematically different for very rapid responses under the legacy RT <= 2s flag. It does not alter the RT likelihood. A negative estimate means the legacy rapid flag is accuracy-negative on average in that data slice; it is not a causal estimate, not a student trait, not a full rapid-guessing model, and not an operational rapid-risk definition. J2b replaces the global raw-second flag with family/item thresholds if this branch proceeds.
13 Priors and identification — audit reference
| Model | Likelihood | Person parameters | Key identification constraint |
|---|---|---|---|
| T1b/T1c (speed-only, rounded lognormal) | Interval-censored lognormal RT. | speed_p = σ_speed · z_p; larger speed = faster. | Item time effects mean-centred; person speed centred by prior. |
| J0 (joint accuracy–RT) | Bernoulli-logit accuracy + interval-censored lognormal RT. | (θ_p, τ_p) from 2-dim Cholesky; larger τ = faster. | Item difficulty and time effects mean-centred. |
| J1 (joint accuracy–family speed) | As J0 with family-specific RT residual SD. | (θ_p, τ_p1, …, τ_pF) from (F+1)-dim Cholesky. | As J0; family speed factors share one multivariate/correlated prior with θ in J1/J2. |
| J2 (joint + rapid-response check) | As J1 with rapid-response term in accuracy linear predictor. | Same as J1 plus γ_rapid in accuracy predictor. | As J1; rapid indicator is observed, not a person parameter. |
14 Diagnostics and validation status
Detailed Bayesian checks belong in a technical appendix or internal diagnostics note. The public status is deliberately compact.
| Model | Run status | Diagnostics status | Checks required before interpretation | Decision status |
|---|---|---|---|---|
| T1b | Completed | Partial / under review | PPC by item/probe and RT tail; prior sensitivity where needed; R-hat/ESS/divergences/treedepth/BFMI. | Research-only |
| T1c | Completed | Partial / under review | Family-specific PPC, RT tail checks, profile-rate checks, prior sensitivity where needed. | Research-only |
| J0 | Completed | Acceptable basic full-run diagnostics | 0 divergences, acceptable scalar R-hat and E-BFMI in full run; still needs PPC and decision validation. | Not decision evidence |
| J1 | Completed | Acceptable basic full-run diagnostics | 0 divergences, acceptable scalar R-hat and E-BFMI in full run; still needs family RT PPC and decision validation. | Not decision evidence |
| J2a | Completed | Failed in Year 1 | Year 1 failed: 544 divergences in one chain, E-BFMI 0.029, scalar max R-hat 12.39; legacy sensitivity only. | Legacy sensitivity only |
| J2b | Planned/not fit | Not run | Requires implementation and full diagnostics; planned family/item thresholds only. | Planned shadow candidate only |
15 Candidate operational profiles
These are candidate labels for future validation. None are currently operational. No score, profile, or flag should be reported unless it maps to a defensible teacher action and includes misuse warnings.
| Profile | Meaning | Before reporting |
|---|---|---|
| Efficient-accurate | Accurate and timely; not a reward for speed alone. | Show that this adds useful action beyond a strong achievement score; avoid rewarding speed for its own sake. |
| Slow-accurate | Accurate but slower; protected from speed penalties. | Validate protection rule and reporting language; specify what the teacher should do differently. |
| Rapid-risk | Very rapid response process with low accuracy or suspicious pattern; audit only until validated. | Validate thresholds, false-positive risk, and whether this is audit-only or an intervention signal. |
| Slow-low-accuracy | Low accuracy plus slow/effortful response process. | Validate interpretation against outcomes and teacher evidence; clarify support intensity. |
| Completion-constrained | Evidence suggests unreached or interrupted items, not simply wrong answers. | Validate logging/admin mechanisms and whether action is administration review, extra time, or instruction. |
| Inconclusive | Logging or pattern is insufficient for a defensible profile. | Define conservative default / no-label rule. |
16 Prior iteration: Term 3 v5 R0 S1–S4 review
A prior S1–S4 joint speed-accuracy review from the March 2026 irt-joint-stan-pcm v5-t3 code path has been archived. It is useful historical background, but it is no longer the active modelling path and is partly superseded by the current M3 T1b/T1c/J0/J1/J2 line. Treat it as background only until the J0/J1/J2 diagnostics are reviewed and written up.