Tuesday, January 10, 2006

Pay Attention. This Will Be On The Test.

Were you ever walking down the street and some strange guy came up to you and asked, "What sort of results are produced by game classification systems?" Yeah? Me, too! I bet you felt pretty stupid when you couldn't think of an answer right on the spot, right? And then you wanted to call your local game store friend but the guy already took your cellphone and your wallet, right? Happens to me all the time.

Now you don't have to feel stupid! The following four comparisons make up the distinguishing aspects of the results of most game classification systems:

Single List vs Multiple List: Single lists present all games in their results. Their usefulness is based on how the results are sorted. Examples include lists ordered by: alphabet, popularity, sales, price, and so on. As long as all games can be compared based on one or more criteria, a single list can be presented. Typically, only parts of the list are relevant to most people, such as the top or bottom entries, or a comparison between the rankings of certain entries.

Multiple list systems return "buckets" of results; only those games that satisfy a certain criteria are returned, such as all games beginning with the letter "C", or all games from 1974 made in China. The results in a bucket may be sorted by additional criteria or may be left unordered.

Single Criteria vs Multiple Criteria: A single criteria system returns a list sorted by a single criteria (duh!), or derives buckets according to a single criteria. Single list/single criteria examples include: alphabetically, popularity, publication date, volume sold, and so on. Single criteria buckets include: games according to first letter, a matching search term, a single mechanic or theme, and so on.

Multiple criteria lists produce lists with a secondary sorting, such as "by year, and alphabetically within each year". Or they can produce more specialized or generalized buckets by combining criteria with logical conjunctions (AND, OR, NOT, ...), such as "is an auction game and was made in 1991".

Single Value vs Multiple Value: A system may insist that a game can be assigned only to a single value within a criteria. For instance, when the criteria is "theme", a game may be allowed to belong to "War of 1812" or "Western", but not both.

Some criteria really require that a game can be assigned to multiple values within the criteria. For instance, considering game mechanics, Amun-Re is both an "auction" and an "economic" game. Demanding that the game be assigned to one or the other of these values within the criteria, but not both, can give someone who is searching for the game a headache.

Discrete vs Weighted Criteria: Discrete values assumes that a game either is or is not something. For instance, a game has sold 40,024 copies or has not sold 40,024 copies. It either is, or is not, made in 2004. When sorting based on discrete criteria, it is easy to assign an item to its proper location based on its discrete value within the criteria.

Using weighted criteria, you can assign a percentage to a value. For instance, we might decide that Amun-Re scores 0.94 as an auction game and 0.88 as a strategy game. If we rank games according to "is an auction game and is a strategy game", we can come up with a formula that will place Amun-Re into its correct location within a list or bucket of games. I'm no expert, but something like the following might work (I'd be happy to see real life systems that do this better):

R = (C1 + (1 - C1)(N-1)/N) * (C2 + (1 - C2)(N-1)/N) ...

Where C1, C2, ... is the value within each criteria, N is the number of criteria, and R is the final ranking.

If the criteria themselves have weights (e.g. it is more important that the game be an auction game than an economic game), we can easily adjust the formula, multiplying the appropriate terms by the weight of the criteria.


You can find all combinations of these systems around the internet, and often multiple combinations on a single site. There is no one correct choice for which of each pair to choose; it depends on the type of information you are looking to present.


The Test

See? I told you that there would be a test. You didn't pay attention!

1. What type of game results do your favorite game sites provide?
2. What type of game results do you typically use/wish you could see?
3. Wouldn't it be cool if one of the search criteria was "pagan element"? ...

Fire: Light, Chaotic. A party game, such as Balderdash. High energy with occasional flare-ups.

Air: Light, Planning. An abstract game, such as Go. Cool and directed.

Water: Heavy, Chaotic. A Euro-game, such as El Grande. Immersive and interactive.

Earth: Heavy, Planning. A war-game, such as ASL. Wide and solid.

Yehuda

Only five days left to nominate for the Board Game Internet Awards! Hurry!

3 comments:

Philippe said...

Your Discrete vs Weighted Criteria sounds a lot like one of the non-standard logics I've been studying a long time ago. There must be some kind of fuzzy-logic that handles these thing well.

That's interesting since, as I know, fuzzy-logic has traditionnally been used mostly in AI (with dubious success). It would be interesting to see how it performs in a social software setting. There must be something on this topic somewhere...

Anonymous said...

Yehuda,

There are really 3 components that need to be addressed to answer your question on Discrete vs Weighted Criteria in real world systems. The first is that of Representation, the second is Similarity Measurement and the third is that of Classification.

Representation has to do with how to represent large amounts of data in a fashion that is computationally manageable. Consider 2 newspaper articles for example. One could index every word in a document and use this as the representation of that document (it is after all the document). In order to compare any two items, they must share common attributes. Thus attributes are added to each item so that all known attributes are represented. In the example, all words from both articles. This actually results in the representation of each article being larger than the original. If more articles were added to the comparison, the total number of attributes skyrockets. This creates a computational nightmare (not to mention storage problem). The problem is usually solved by first selecting a representation that is an abstraction of each item and then reducing the dimensions of that abstraction to a manageable level. In textual systems (e.g. Google), one method used for abstraction is word frequency. Each document has each word identified and is counted for every occurrence. All words in all documents to be compared are counted for every document (obviously, documents have counts of 0 for many words.) This results in vectors with many word frequencies attributes (1 for each possible word). These large vectors could be compared directly, but the computation of all the elements is very time consuming, especially if there are many items to be compared. The solution is to reduce the number of dimensions of the vectors through various mathematic techniques such as Singular Value Decomposition (SVD). The net result is a compressed representation of the originals that can be easily compared.

Once the representation is established, then a Similarity Measurement is quite straight forword. The simplest method is a vector dot product.

Σi (XiYi), where
i is iterated for all the representation elements in the compressed vectors
Xi is the ith element of the first document
Yi is the ith element of the second document

The above results in a similarity score between any 2 documents. Classification is really nothing more than comparing any document’s representation to the representation of any particular class within that classification system. Usually classification systems start out with known classes and compare items to these predefined categories in order to classify each. This requires that someone develop good classes, which as it turns out is a surprisingly hard thing to do.

One interesting approach is to reverse the problem. First cluster (i.e. group) similar items together and then determine / create a class for each cluster that emerges. A particularly straight forward technique for this is known as a Kohonen Self-Organizing Feature Map (i.e. SOM or SOFM). There is still a bit of a priori classification with a SOM in that you first determine the number of classes you want, but you do not have to define what those classes are.

For more on text retrieval and searching, check out:
Text Analysis with Compare

For more on Kohonen Maps, check out:
Kohonen Maps

Yehuda Berlinger said...

Ken: uh oh, I think I'm going to fail. :-) Thanks for the info. I only got a minor in math.

As you say, now we have to figure out good values for the criteria, as well as good criteria.

It is surprising to me how many people and sites categorize games according to a strange mishmash of theme, mechanics and/or audience, instead of choosing only one, or instead of creating a sensible vector system.

Yehuda