Data Sources & Methodology

Our Data

Every claim OPAX makes is backed by publicly available records. Here is everything we aggregate, how we process it, and where it comes from β€” 14 categories of government data, cross-referenced to expose the gaps between rhetoric and reality.

Loading live counts...

Parliamentary Speeches

125 years of what they said on the floor β€” federal, state, and committee.

Voting Records

How they actually voted β€” not what they said they would.

Political Donations

Follow the money. Every disclosed dollar, classified by industry.

Government Contracts

Where the money goes after the votes are cast.

Government Grants

Discretionary spending mapped to electorates and donors.

MP Expenses

How parliamentarians spend public money on themselves.

Lobbying

Who has the ear of government.

Ministerial Meetings

Behind closed doors β€” disclosed diary entries.

Board Appointments

Government patronage networks and revolving doors.

Audit Reports

Independent scrutiny of government programs.

News Coverage

Media accountability and policy coverage.

MP Registered Interests

What they own, who they work for, and what they haven't told you.

Electoral Data

Demographics, results, and the geography of representation.

Methodology

Raw data is only the beginning. Here is how we turn scattered records into a connected accountability graph.

Speaker-to-Member Linking

Over 74% of speeches linked to specific MPs via name resolution, electorate matching, and party affiliation across all jurisdictions.

Topic Classification

16 policy topics assigned via keyword-based classification. Every speech is scored against topic keyword lists for relevance ranking.

Donation Industry Classification

27 industry sectors mapped to donor entities using rule-based classification at 99.9% coverage, enabling cross-referencing with policy positions and voting records.

Semantic Embeddings

all-MiniLM-L6-v2 model producing 384-dimensional vectors for every speech, enabling semantic search beyond keyword matching.

Hybrid Search (RRF)

Reciprocal Rank Fusion combines semantic similarity (cosine) with FTS5 BM25 keyword scoring for robust, typo-tolerant search across 1M+ records.

Entity Resolution

Fuzzy matching across donors, contractors, lobbyists, and board appointees links the same entity across disparate government datasets.

RAG Pipeline

Retrieval-augmented generation assembles speeches, donations, votes, and contracts as context for Claude to produce evidence-based, citation-backed answers.

Attribution

OPAX is built entirely on publicly available data. We gratefully acknowledge the following sources and their contributors:

  • Commonwealth of Australia β€” Hansard, AEC, AusTender, GrantConnect, IPEA, ANAO
  • Tim Sherratt / GLAM Workbench β€” Historic Hansard XML corpus (1901–2005)
  • Zenodo / Harvard Dataverse β€” Hansard datasets (1998–2022)
  • OpenAustralia Foundation β€” OpenAustralia & TheyVoteForYou APIs
  • NSW, Victorian, SA, QLD Parliaments β€” State Hansard APIs and open data
  • Attorney-General's Department β€” Federal Register of Lobbyists
  • QLD Premier's Office β€” Ministerial diary disclosures
  • QLD Electoral Commission β€” State donation disclosures
  • icacpls.github.io β€” Historical MP expenses and NSW donation data
  • ABS β€” 2021 Census General Community Profile
  • data.gov.au β€” AGOR, Directory.gov.au
  • Guardian Australia & ABC News β€” Political news coverage

All data is used in accordance with applicable licences. Where data is Crown Copyright or Commonwealth Copyright, it is reproduced under open access provisions. OPAX does not claim ownership of source data.