Collation specifies how data is sorted and compared in a database. Collation provides the sorting rules, case, and accent sensitivity properties for the data in the database.
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.
MySQL can do these things for you:
Store strings using a variety of character sets
Compare strings using a variety of collations
Mix strings with different character sets or collations in the same server, the same database, or even the same table
Enable specification of character set and collation at any level
In these respects, MySQL is far ahead of most other database management systems. However, to use these features effectively,
you need to know what character sets and collations are available, how to change the defaults, and how they affect the behavior of string operators and functions.
Read more: CharacterSets/Collations
Character Sets and Collations
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.
Let's make the distinction clear with an example of an imaginary character set.
Suppose that we have an alphabet with four letters: 'A', 'B', 'a', 'b'. We give each letter a number: 'A' = 0, 'B' = 1, 'a' = 2, 'b' = 3.
The letter 'A' is a symbol, the number 0 is the encoding for 'A', and the combination of all four letters and their encodings is a character set.
Now, suppose that we want to compare two string values, 'A' and 'B'. The simplest way to do this is to look at the encodings: 0 for 'A' and 1 for 'B'.
Because 0 is less than 1, we say 'A' is less than 'B'. Now, what we've just done is apply a collation to our character set.
The collation is a set of rules (only one rule in this case): "compare the encodings."
We call this simplest of all possible collations a binary collation.
Character Sets and Collations in General
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.
Suppose that we have an alphabet with four letters: 'A', 'B', 'a', 'b'. We give each letter a number: 'A' = 0, 'B' = 1, 'a' = 2, 'b' = 3.
The letter 'A' is a symbol, the number 0 is the encoding for 'A', and the combination of all four letters and their encodings is a character set.
What if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules:
(1) treat the lowercase letters 'a' and 'b' as equivalent to 'A' and 'B'; (2) then compare the encodings.
We call this a case-insensitive collation. It's a little more complex than a binary collation.
Read more: CharacterSets/Collations
Login in to like
Login in to comment