How Conflict Radar processes, verifies, and forecasts geopolitical events
The platform continuously ingests data from multiple source types, each assigned a reliability tier (1–5):
Official government sources, wire services (Reuters, AP)
Major international outlets (BBC, Al Jazeera, GDELT)
Regional media, specialized conflict monitors
Social media, unverified sources (always labeled)
URLs are canonicalized to prevent duplicate ingestion. Each piece of evidence is timestamped, geolocated, and tagged with source metadata.
Evidence items are grouped into story clusters using title similarity and temporal proximity. The algorithm uses Dice coefficient for text similarity with a time-decay bonus for items within a 24-hour window.
Each cluster aggregates evidence from multiple sources, enabling cross-referencing and reliability-weighted scoring.
Events are extracted from clusters using structured LLM prompts with strict JSON schema validation. The AI extracts:
The AI is instructed to never invent information not present in the evidence. All outputs are validated against the schema before storage.
Events progress through four status levels:
Every status change is logged with reason codes, evidence URLs, and the entity that made the change.
The forecasting pipeline computes daily feature vectors for each country, including:
Risk scores (0–100) are computed using a feature-weighted model with probability estimates at 7-day, 30-day, and 90-day horizons. Weekly backtesting computes Brier scores and hit rates to monitor calibration.
When an event is found to be inaccurate: