Removing duplicates from a list in C# is a common task, especially when working with large datasets. C# provides multiple ways to achieve this efficiently, leveraging built-in collections and LINQ.
Using HashSet (Fastest for Unique Elements)
A HashSet<T>
automatically removes duplicates since it only stores unique values. This is one of the fastest methods:
List<int> numbers = new List<int> { 1, 2, 2, 3, 4, 4, 5 };
numbers = new HashSet<int>(numbers).ToList();
Console.WriteLine(string.Join(", ", numbers)); // Output: 1, 2, 3, 4, 5
Using LINQ Distinct (Concise and Readable)
LINQ’s Distinct()
method provides an elegant way to remove duplicates:
List<int> numbers = new List<int> { 1, 2, 2, 3, 4, 4, 5 };
numbers = numbers.Distinct().ToList();
Console.WriteLine(string.Join(", ", numbers)); // Output: 1, 2, 3, 4, 5
Removing Duplicates by Custom Property (For Complex Objects)
When working with objects, DistinctBy()
from .NET 6+
simplifies duplicate removal based on a property:
using System.Linq;
using System.Collections.Generic;
class Person
{
public string Name { get; set; }
public int Age { get; set; }
}
List<Person> people = new List<Person>
{
new Person { Name = "Alice", Age = 30 },
new Person { Name = "Bob", Age = 25 },
new Person { Name = "Alice", Age = 30 }
};
people = people.DistinctBy(p => p.Name).ToList();
Console.WriteLine(string.Join(", ", people.Select(p => p.Name))); // Output: Alice, Bob
For earlier .NET versions, use GroupBy()
:
people = people.GroupBy(p => p.Name).Select(g => g.First()).ToList();
HashSet<T>
is the fastest but only works for simple types.
Distinct()
is easy to use but slower than HashSet<T>
for large lists.
DistinctBy()
(or GroupBy()
) is useful for complex objects but may have performance trade-offs.
Conclusion
Choosing the best approach depends on the data type and use case. HashSet<T>
is ideal for primitive types, Distinct()
is simple and readable, and DistinctBy()
(or GroupBy()
) is effective for objects.