re: On the "Significance of Numbers"

Clinton Mah wrote a post (SemanticHacker: Significance of Numbers) regarding probabilities and semantics where he used an example of a colleague eating a can of soup, but not realizing implications of the message on the label. This was my response:

Context is always the key. In your example, the number had units (mg) so you have a beginning of the context. Then you also have the ingredient listed (sodium) and the fact that this was on a can of soup, so presumably part of the context is “a person eating a can of soup”. In your example, it is the broader context that provides the meaning to the number.

This broader context includes the fact that the government has mandated that the can be labelled, and that the label indicate the amount of sodium. The government mandated this label in order that the manufacturer communicate to the customer this measurement. As a consumer of the soup, asside from having to read the can’s label, I have to recall that there is a context mandating the label. But the trouble here is not that I don’t know there’s a context, it is rather that I’m not completely “read-in” to that context. I don’t remember that the actual daily recommended amount of sodium is 900 mg. Even if I don’t remember the daily serving, I can presume that there must be one that I could compare to the measurement indicated on the label, due to the fact that there is the label and it has that measurement.

My experience of the context “eating a can of soup in an environment where the government has forced the manufacturer to report a sodium amount to me” may be incomplete. This just goes to show that I can exist and interact with a context, even if I’m not a principal player in defining it.

Presumably, the meaning of the probability associated with a term should tell me (and any software using the number) of a relative confidence in a particular “interpretation” of the term. But this interpretation of the number (not the term) is general. It does not tell me the actual meaning of the term in the “term’s context”, it merely associates the term to several other terms telling me the one’s it is most likely associated to given other instances of the term experienced earlier.

This meaning of the probability applies to every such probability number in the software system.

Did I miss your point? The meaning of words is context dependent. By capturing your “semantic signature” you really are capturing an approximation of that context, but you have not captured the meaning of the term, and the probability numbers have not either.

Yes, the more terms I put into my search, the more likely it is that I will find other documents from the same context. But as you say, the training set must have included enough samples from that context to make a statistically useful estimate (else I won’t find my context correctly).

I don’t know, I think there must be something a bit different between the actual, experiential semantics in my head and that statistical estimation of semantics that you are describing here. At least in magnitude if not in kind.


