When we think of sanitizing user input, we often focus on the client facing portion of it, meaning setting appropriate types for our input elements in HTML and (hopefully) running some client side validation logic as well.
But, every developer knows, you have to do the same on the server. And C# offers a slew of built-in functions to do just that.
In this article I will be going over some of the more commonly used methods that are native to C# that are useful when validating and sanitizing user input before it hits your databases.
1. Using parameterized Queries
If you're not already using parameterized queries, then stop reading and go implement them right away. And if you're not sure how, this article on how to prevent SQL injections in C# will show you how.
But essentially, when you're getting data from a database (or adding) you don't want to simply just toss a users input directly into your queries. Those queries could potentially include malicious scripts that could wreak havoc on your database.
Parameterized queries help to mitigate that possibility by treating any given user input as simply just data.
using (SqlCommand command = new SqlCommand("SELECT * FROM Users WHERE Username = @Username", connection))
{
command.Parameters.AddWithValue("@Username", userInput); // Safe data handling
}
Any parameters that you must include your queries, should go through this process regardless of the data type or how a user is entering it.
2. HTML Encode and Decode
If you're accepting user input that's going to eventually be shown on an HTML page, such as a message board or a comments section, then you definitely want to prevent users from including any HTML content and potentially leaving you vulnerable to script injection attacks or cross-site scripting attacks (XSS).
C# offers the System.Net.WebUtility.HtmlEncode function to handle just that:
string sanitizedInput = System.Net.WebUtility.HtmlEncode(userInput);
And if you do allow for HTML rendering on the page, perhaps after some review process, then you can call the HtmlDecode method as well in order to switch back.
This is good practice to have for essentially any kind of input regardless of how a user adds it into your system.
3. Using Regex for specific patterns
Regex is perfect for those scenarios where you have a very specific pattern that you need to adhere to, whether it be an email address, phone number or some kind of ID.
var emailPattern = @"^[^@\s]+@[^@\s]+\.[^@\s]+$"; // Only allows text like "name@example.com"
bool isValidEmail = Regex.IsMatch(userInput, emailPattern);
C# makes the process simple with the Regex class in the System.Text.RegularExpressions namespace. Particularly with the IsMatch method, which accepts some given input and a specified pattern and it returns either true or false depending on whether the input matches the pattern.
And yes, regex patterns can be notoriously difficult to understand, and it's usually always best to use one that has been tested already.
4. Sanitize XML data
If you're going to be storing user data into an XML format, then you have to take the same precautions as you do with the encoding/decoding scenario above. Special characters used in XML, such as < and > , can cause the XML to become malformed and that can break your system down the line when you are reading that data.
You can use the System.Security.SecurityElement.Escape method in order to ensure that this doesn't happen.
string safeXml = System.Security.SecurityElement.Escape(userInput); // Replaces `<` with safe equivalents
Typically these days, JSON is the preferred format when it comes to storing data over XML, but there are still many scenarios where this can come in handy, even in such things as dynamically generating XML sitemaps for a website.
5. Trim and Restrict length
Depending on the type of data, you typically want to set some kind of character limit to avoid any kind of overflow or unexpected long input that can break your system.
string sanitizedInput = userInput.Trim();
if (sanitizedInput.Length > maxLength)
{
sanitizedInput = sanitizedInput.Substring(0, maxLength);
}
Ideally, you also want to set this limit directly in your database schema and on the client-side as well, letting users know just how much they can type into a given field.
6. Use specific characters
Depending on the content type again, you might want to restrict just what characters are allowed in your system. And you can use the fanciful Regex (from above) once again for that.
var usernamePattern = @"^[a-zA-Z0-9]*$";
bool isValidUsername = Regex.IsMatch(userInput, usernamePattern);
These are just some of the more common scenarios that pretty much every developer out there should be mindful of, but the list is far from over.
With the web getting more and more complex and advanced on a daily basis, there are many newer vectors that potential malicious characters could take advantage of, and it's every developers job out to stay up to date and to keep their systems secure.