Which Language Model Masters Connect Four?

Large Language Models from OpenAI, xAI, and Google play connect four against each other.
LLM Games is an experiment from Ramp Labs.
 Crosstable (score / games from head-to-head matchups):

Agent            | GPT-5-high   | O3      | GPT-5-medium | Grok4   | O4Mini  | Gemini25Flash | Gemini25Pro | gpt-oss-20B | gpt-oss-120B | Total
 ----------------+--------------+---------+--------------+---------+---------+---------------+-------------+-------------+--------------+----------
 GPT-5-high      | -            | 2.0 / 2 | 0.0 / 0      | 2.0 / 2 | 2.0 / 2 | 2.0 / 2       | 2.0 / 2     | 2.0 / 2     | 2.0 / 2      | 14.0 / 14
 O3              | 0.0 / 2      | -       | 1.5 / 2      | 1.0 / 2 | 2.0 / 2 | 2.0 / 2       | 2.0 / 2     | 2.0 / 2     | 2.0 / 2      | 12.5 / 16
 gpt-5-medium    | 0.0 / 0      | 0.5 / 2 | -            | 2.0 / 2 | 2.0 / 2 | 0.0 / 2       | 2.0 / 2     | 2.0 / 2     | 2.0 / 2      | 10.5 / 14
 Grok4           | 0.0 / 2      | 1.0 / 2 | 0.0 / 2      | -       | 2.0 / 2 | 2.0 / 2       | 2.0 / 2     | 2.0 / 2     | 1.0 / 2      | 10.0 / 16
 O4Mini          | 0.0 / 2      | 0.0 / 2 | 0.0 / 2      | 0.0 / 2 | -       | 1.0 / 2       | 2.0 / 2     | 2.0 / 2     | 1.0 / 2      |  6.0 / 16
 Gemini25Flash   | 0.0 / 2      | 0.0 / 2 | 2.0 / 2      | 0.0 / 2 | 1.0 / 2 | -             | 1.0 / 2     | 1.0 / 2     | 2.0 / 2      |  7.0 / 16
 Gemini25Pro     | 0.0 / 2      | 0.0 / 2 | 0.0 / 2      | 0.0 / 2 | 0.0 / 2 | 1.0 / 2       | -           | 2.0 / 2     | 2.0 / 2      |  5.0 / 16
 gpt-oss-20B     | 0.0 / 2      | 0.0 / 2 | 0.0 / 2      | 0.0 / 2 | 0.0 / 2 | 1.0 / 2       | 0.0 / 2     | -           | 2.0 / 2      |  3.0 / 16
 gpt-oss-120B    | 0.0 / 2      | 0.0 / 2 | 0.0 / 2      | 1.0 / 2 | 1.0 / 2 | 0.0 / 2       | 0.0 / 2     | 0.0 / 2     | -            |  2.0 / 16

 Final Elo Ratings:
 GPT-5-high: 1728.4
 O3: 1681.9
 GPT-5: 1662.0
 Grok4: 1616.3
 O4Mini: 1547.4
 Gemini25Flash: 1535.7
 Gemini25Pro: 1500.4
 gpt-oss-120B: 1456.1
 gpt-oss-20B: 1453.8