← Back to index

How We Score

Three ways to measure. All designed so you can be part of it.

One number. Zero to one.

Imagine you had to build one robot that could do everything a human can do. Run, paint, tell jokes, do maths, comfort a crying friend.

0 means robots can't do any of it. 1means a single robot can match every human record. Right now, we're at about 0.05.

#

The Stopwatch

Absolute Records · ~50 records

Anything you can time, count, or measure. Sprint speed, push-ups, digits of pi, chess rating.

How it works

  1. 1We measure the robot.
  2. 2We measure the human.
  3. 3We divide.

Example

Usain Bolt runs 100m in 9.58 seconds. The fastest humanoid robot does it in 48 seconds.

Score = 9.58 ÷ 48 = 0.20

Rules

  • If the robot beats the human, the score caps at 1.0. No extra credit for being superhuman.
  • Same test, same conditions. If the human had 60 seconds, the robot gets 60 seconds.

The shareable moment

"The robot was 5× slower."

?

The Blind Test

Creative Records · ~11 records

Painting, music, poetry, short stories, song writing, logo design. Anything where quality is subjective.

How it works

  1. 1A human creates something following a brief.
  2. 2An AI creates the same thing, same brief.
  3. 3Both are shown to voters — without labels.
  4. 4You vote. Then we reveal.

Example

60% of voters prefer the AI painting. Only 30% correctly identified it as AI.

Score = 0.60 × (1 − 0.30) = 0.42

Rules

  • Minimum 1,000 votes before a score counts.
  • Entries submitted by verified teams, not random uploads.
  • Results lock after 1,000 votes AND 48 hours.
  • One vote per person per test.

The shareable moment

"Can YOU tell which is AI?"

The Thumbs Up

Pass/Fail Records · ~15 records

Catching a ball. Comforting an upset person. Standing on one leg. Things that aren't about being better — they're about being good enough.

How it works

  1. 1The robot attempts the task on video.
  2. 2The video is published here.
  3. 3You watch. You answer one question:
  4. 4"Did it do it? Yes or No."

Example

A robot tries to comfort a crying child. 31% of voters say yes.

Score = 0.31

Rules

  • Minimum 500 votes before a score counts.
  • Video must show the full attempt, unedited.
  • The question is always the same: did it do it?

The shareable moment

"Does this count?"

What about Mixed records?

About 5 records combine measurement with judgment — like cooking a meal (time + taste) or building a bridge (speed + structural quality). These use an Absolute score for the measurable part and a Thumbs Up vote for the quality part, averaged together.

How Scores Add Up

1

Record → Axis

Take the best record in each axis. If a robot scores 0.40 at chess and 0.00 at Go, the Strategy axis scores 0.40. One breakthrough proves the axis is penetrable.

2

Axis → Domain

Average all axis scores in a domain. Mind has 8 axes — average them all. This means a machine can't hide weakness behind one strong axis.

3

Domain → HRI

Average all 6 domain scores. That's the number. One number. How close is a single robot to matching everything a human can do?

Why "one robot" matters

It would be easy to score 1.0 if you could use a different machine for each task — a chess computer for chess, Atlas for backflips, a surgical robot for stitching. But that's not the question.

The question is: can you build one machine that does it all?

A chess engine can't catch a ball. Atlas can't write a poem. GPT can't do a push-up. The score will stay low until someone builds a generalist.

Built to Share

🗳️

The Vote

Every Blind Test and Pass/Fail attempt is a public vote. You're not watching — you're judging.

🎭

The Reveal

10,000 people voted that Painting A was "obviously human." The curtain drops. It was the robot. That moment is the content.

🔢

The Number

"We're at 0.05." Simple enough for a headline, specific enough to track over time.

💬

The Debate

"Does that robot backflip really count?" Pass/fail votes generate arguments. Arguments generate engagement.

📊

The Leaderboard

Which domain is closest to 1.0? Mind leads. Body lags. Why? That's a conversation.

📰

The Update

When a new AI achievement drops, the score changes. "HRI just jumped from 0.05 to 0.07." That's a news story.

The 10-year-old version

What is the Human Record Index?

A score that tells you how close robots are to being as good as humans at everything.

What's the score right now?

About 0.05 out of 1.00. Robots are 5% of the way there.

How do you work it out?

Three ways. Races and tests — time the robot and compare. Secret voting — a human and a robot both make something, you vote for the best one without knowing which is which. Yes or no — watch a robot try to catch a ball and vote if it worked.

Why one robot?

Because using different robots for different tasks is cheating. The whole point is: can you build one machine that does it all?

Can I vote?

Yes. Every Blind Test and every Pass/Fail is voted on by real people right here.

Blind Test Formula — The Details

The Blind Test score captures two things at once: quality(do people prefer the AI's work?) and deception(can people tell it's AI?).

Score = (AI preference %) × (1 − detection rate)

AI preference % — the fraction of voters who chose the AI entry as better. If nobody prefers it, the score is zero regardless of how good the disguise is.

Detection rate — the fraction of voters who correctly identified which entry was AI. If everyone can tell, the multiplier drops to zero. If nobody can tell, the multiplier is 1.0.

This means an AI scores highest when people prefer its work and can't tell it's not human. That's the real test.

Bad AI

0.20 × (1 − 0.90) = 0.02

Nobody likes it, everyone spots it

Good but obvious

0.70 × (1 − 0.80) = 0.14

People like it but know it's AI

Indistinguishable

0.55 × (1 − 0.45) = 0.30

Slight preference, coin-flip detection