Comparative Hotness Viz
Critique of the 1-10 Scale
Social flaws (in the wild)
- It is strategically misreported for social positioning
- These misreports lead to distortive feedback loops
- Central tendency bias crushes the information in the tails
- It implies false specificity
- It implies false commensurability across ratings
Analytical flaws (in the lab)
- Effects of hotness are nonlinear, and the scale is not calibrated to this nonlinearity
- Lacks a natural zero point, despite the fact that hotness can be negative
- Hotness is relational, but the scale does not encode relational information
- Hotness is causally efficacious, but values on the scale are not readily interpretable as causal variables
What is hotness?
- Relative hotness is well-modeled as the attractive or repulsive tendency resulting from one person's appearance, with respect to another person
- Objective hotness is the aggregation of the measurements of this tendency, as considered across multiple people
Under this definition, men of reproductive age most frequently have negative hotness with respect to women. Women of reproductive age typically have positive hotness with respect to men.
This model clarifies hotness as an observable physical phenomenon rather than an abstract subjective assessment.
Ideal features of a measure
Our measure should clearly represent the intensity of attraction or repulsion resulting from a person's appearance, and clearly identify the cohort across which this holds.
The ideal instrument would:
- Represent negative hotness intuitively
- Have stable distance units. 4→5 should not be categorically different from 8→9
- Let us use objective observational measures to locate people on the scale
- Be readily interpretable as a causal variable/basis for prediction
- Naturally support aggregation and commensuration of ratings
- Not require ad hoc correction for nonlinearity in the effects of hotness
- Not smuggle assumptions about comparison class into the data
A superior measure
An example of a scale that satisfies these criteria is a dollar scale representing the amount you would pay (or have to be paid) to have sex with/kiss/date/trade appearances with a given person. For obvious reasons, this would be devilishly hard to collect valid data on, but it gestures at the ideal.
More convenient measures:
- Pure percentile: e.g., 95th percentile hotness, i.e., 1 in 20 members of the comparison class would be rated as hotter.
- Semantic binning plus rarity markers: e.g., "Super hot - I only see people that hot once a month or so".
These avoid many of the problems of the 1-10 scale -- they describe reality more directly and with fewer smuggled assumptions, while being more obviously actionable -- but they do not satisfy all of the ideal features.
Alternate paradigm:
IMO, really we should just abandon the idea of a scale altogether. It may be useful for reporting results to the public in terms they grok, but it's as likely to mislead as enlighten.
Instead, like always, we should just be creating models and testing them. For example, let's imagine that objective hotness really is a latent variable. So we say, in this world, subjective hotness ratings are a function of objective hotness, plus some noise. We then test this model against the data. Then say, okay, maybe the rater hotness is also a latent variable, and model that and test again. Eventually we find the model that best satisfies our predictive goals and supports useful intuitions.
Then we can extract something like a 1-10 rating from that model, and say, "ok, this person is a 7/10," to satisfy some social need, but we're not tricking ourselves by disguising the other variables or overstating the predictive power of the "1-10 hotness" variable.