Data Sources & Methodology
Our Data
Every claim OPAX makes is backed by publicly available records. Here is everything we aggregate, how we process it, and where it comes from β 14 categories of government data, cross-referenced to expose the gaps between rhetoric and reality.
Loading live counts...
Parliamentary Speeches
125 years of what they said on the floor β federal, state, and committee.
Voting Records
How they actually voted β not what they said they would.
Political Donations
Follow the money. Every disclosed dollar, classified by industry.
Government Contracts
Where the money goes after the votes are cast.
Government Grants
Discretionary spending mapped to electorates and donors.
MP Expenses
How parliamentarians spend public money on themselves.
Lobbying
Who has the ear of government.
Ministerial Meetings
Behind closed doors β disclosed diary entries.
Board Appointments
Government patronage networks and revolving doors.
Audit Reports
Independent scrutiny of government programs.
News Coverage
Media accountability and policy coverage.
MP Registered Interests
What they own, who they work for, and what they haven't told you.
Electoral Data
Demographics, results, and the geography of representation.
Methodology
Raw data is only the beginning. Here is how we turn scattered records into a connected accountability graph.
Speaker-to-Member Linking
Over 74% of speeches linked to specific MPs via name resolution, electorate matching, and party affiliation across all jurisdictions.
Topic Classification
16 policy topics assigned via keyword-based classification. Every speech is scored against topic keyword lists for relevance ranking.
Donation Industry Classification
27 industry sectors mapped to donor entities using rule-based classification at 99.9% coverage, enabling cross-referencing with policy positions and voting records.
Semantic Embeddings
all-MiniLM-L6-v2 model producing 384-dimensional vectors for every speech, enabling semantic search beyond keyword matching.
Hybrid Search (RRF)
Reciprocal Rank Fusion combines semantic similarity (cosine) with FTS5 BM25 keyword scoring for robust, typo-tolerant search across 1M+ records.
Entity Resolution
Fuzzy matching across donors, contractors, lobbyists, and board appointees links the same entity across disparate government datasets.
RAG Pipeline
Retrieval-augmented generation assembles speeches, donations, votes, and contracts as context for Claude to produce evidence-based, citation-backed answers.
Attribution
OPAX is built entirely on publicly available data. We gratefully acknowledge the following sources and their contributors:
- Commonwealth of Australia β Hansard, AEC, AusTender, GrantConnect, IPEA, ANAO
- Tim Sherratt / GLAM Workbench β Historic Hansard XML corpus (1901β2005)
- Zenodo / Harvard Dataverse β Hansard datasets (1998β2022)
- OpenAustralia Foundation β OpenAustralia & TheyVoteForYou APIs
- NSW, Victorian, SA, QLD Parliaments β State Hansard APIs and open data
- Attorney-General's Department β Federal Register of Lobbyists
- QLD Premier's Office β Ministerial diary disclosures
- QLD Electoral Commission β State donation disclosures
- icacpls.github.io β Historical MP expenses and NSW donation data
- ABS β 2021 Census General Community Profile
- data.gov.au β AGOR, Directory.gov.au
- Guardian Australia & ABC News β Political news coverage
All data is used in accordance with applicable licences. Where data is Crown Copyright or Commonwealth Copyright, it is reproduced under open access provisions. OPAX does not claim ownership of source data.