If you’ve ever run into the dreaded ?? problem, this guide will walk you through how to properly configure your database to store emojis and other Unicode characters. I myself had this issue for years, and assumed that it had something to do with my CMS. And so I added it to my to-list, and years later I decided to finally mark it as complete.
Why Do Emojis Turn Into ???
The issue stems from the way SQL Server handles character encoding. Emojis are part of the Unicode standard, requiring a data type and collation that support Unicode. If your table uses a non-Unicode data type like VARCHAR, SQL Server replaces unsupported characters (like emojis) with ?? because it cannot interpret them.
When I launched this blog over 10+ years ago, I didn't take that into consideration and assumed that VARCHAR(MAX) was enough to handle my blog content needs. When I started to incorporate more emojis (when Windows added the winkey + . shortcut for emojis), they were converting into ??, which pretty much meant that no articles before this one have any rendered emojis 🤷♂️ .
How to fix
In order to fix this issue, you'll need to alter your database column to have the right data type, that being NVARCHAR.
ALTER TABLE table_name
ALTER COLUMN column_name NVARCHAR(MAX)
Why NVARCHAR?
The issue with VARCHAR is that it only supports ASCII and extended ASCII characters, which take up 1 byte per character.
However NVARCHAR supports UNICODE (UTF-16 encoding), which allows for storage of emojis and other characters, such as multilingual text. This takes up 2 bytes per character.
And yes, that means that when you make this update to your database, the amount of storage used will indeed go up. So if you are working with a shared database server, with limited storage space, do take that into consideration.
Recommended workflow for a safe update
Alter a column from VARCHAR(MAX) to NVARCHAR(MAX) is typically safe for the most part. However, it's always good to be cautious when performing any kind of update, particularly on a production level database with thousands if not millions of records.
If you have the storage space available, I personally tend to back up my column data into a new column, for safe keeping in case I need to quickly revert back.
1. Add a temporary column with NVARCHAR
ALTER TABLE table_name ADD temp_column_name NVARCHAR(MAX)
2. Copy data to the new column
UPDATE table_name SET temp_column_name = your_column
3. Alter your column
ALTER TABLE table_name
ALTER COLUMN column_name NVARCHAR(MAX)
4. Verify it works
Check to ensure that your column data is unchanged and then test with a quick insert:
INSERT INTO table_name (content)
VALUES (N'✌️ This is only a test');
If you query your data right after, you should see the rendered emoji and not ?? . Also note that you must insert your string with the N prefix, as this informs your database that this is indeed a string with Unicode characters.
Typically VARCHAR will be faster for smaller non-Unicode data sets, because it uses less storage as mentioned above (1 byte per character).
However, NVARCHAR might be a bit slower because it does consume more memory overall (2 bytes per character).
All to say that if your database tables don't require any kind of special character rendering, then sticking to VARCHAR for those cases makes sense.
In regards to indexes, I would say avoid creating indexes directly on NVARCHAR(MAX) columns and use computed columns or full-text search for better performance.
Conclusion
While it is great to finally be able to have emojis on this blog ⚡🗡️, I did have to consider my databases limitations, mainly the limited amount of storage space. I did the proper backups, cloned my data temporarily, ran the update and tested. With any database update, it is always important to treat it with respect, as one improper command could potentially lead to a long night ☕.
Walter Guevara is a Computer Scientist, software engineer, startup founder and previous mentor for a coding bootcamp. He has been creating software for the past 20 years.