Often when using SQL to select information from tables, the user receives redundant data consisting in the presence of absolutely identical repeating rows. To eliminate this situation, use the SQL distinct argument in the Select clause. This article will discuss examples of the use of this argument, as well as situations in which it is better to refuse to use the argument.
Before we begin to consider specific examples, we will create a couple of necessary tables in the database.
Table preparation
Imagine that we have in our database information on wallpapers presented in two tables. This is the Oboi table (wallpaper) with the fields id (unique identifier), type (type of wallpaper - paper, vinyl, etc.), color (color), struct (structure) and price (price). And the Ostatki table (leftovers) with fields id_oboi (link to a unique identifier in the Oboi table) and count (the number of rolls in stock).
Fill the tables with data. Add 9 entries to the table with wallpapers:
Oboi |
id | type | color | struct | price |
1 | Paper | Multicolor | Embossed | 56.9 |
2 | Two-layer paper | Beige | Smooth | 114.8 |
3 | Vinyl | Orange | Embossed | 504 |
4 | Non-woven | Beige | Embossed | 1020.9 |
5 | Two-layer paper | Beige | Smooth | 150.6 |
6 | Paper | Multicolor | Smooth | 95.4 |
7 | Vinyl | Brown | Smooth | 372 |
8 | Non-woven | Whites | Embossed | 980.1 |
9 | Fabric | Pink | Smooth | 1166.5 |
There are also nine entries in the table with balances:
Ostatki |
id_oboi | count |
1 | 8 |
2 | 12 |
3 | 24 |
4 | 9 |
5 | 16 |
6 | 7 |
7 | 24 |
8 | 32 |
9 | eleven |
Let us now describe how to use distinct in SQL.
Place distinct in the Select clause
The distinct argument should be placed immediately after the Select keyword in queries. It applies immediately to all columns specified in the Select clause, because it will exclude absolutely identical rows from the final query result. Thus, it is enough to specify once when writing the SQL query "select distinct". The exception is the use of distinct inside aggregate functions, which we will consider a little later.
It should be remembered that most DBMSs do not recognize your request of the form:
SELECT distinct Ostatki.Count, distinct Oboi. * FROM Oboi INNER JOIN Ostatki ON Oboi.id = Ostatki.id_oboi |
Here, the argument in question is indicated several times or indicated once, but before the second, third, or other selectable column. You will receive an error with reference to inaccuracies in the syntax.
Using distinct in standard queries
Obviously, with the competent construction of the structure of the tables and their filling, situations where absolutely identical rows are encountered inside the same table are excluded. Therefore, the execution of the query "Select distinct *" with a selection from one table is practically impractical.
Imagine a situation where we need to find out what type of wallpaper we have, immediately, for convenience, we will sort by type:
SELECT Oboi.type FROM Oboi order by type |
And we get the result:
type |
Paper |
Paper |
Two-layer paper |
Two-layer paper |
Vinyl |
Vinyl |
Fabric |
Non-woven |
Non-woven |
As you can see, the table contains duplicate rows. If we add Select distinct to the sentence:
SELECT distinct Oboi.type FROM Oboi order by type |
then we get the result without repeating:
type |
Paper |
Two-layer paper |
Vinyl |
Fabric |
Non-woven |
Thus, if the data were correctly entered into the tables, then immediately by a call or a request from customers, we will be able to answer that there are no liquid wallpapers, glass wallpapers and acrylic wallpapers in the store. Given that the assortment in stores is usually not limited to one hundred wallpapers, it would be rather laborious to look through a list of non-unique types.
Using distinct within aggregate functions
The SQL distinct argument can be used with any aggregate function. But for Min and Max, its use will not have any effect, and when calculating the sum or average value, it is rarely possible to imagine a situation where repetitions would not be necessary.
Suppose we want to know how full our warehouse is, and for this we send a request calculating the total number of rolls in the warehouse:
SELECT sum (Ostatki.count) FROM Ostatki |
The request will respond 143. If we change to:
SELECT sum (distinct Ostatki.count) FROM Ostatki |
then we get only 119, because the wallpaper under articles 3 and 7 are in stock in the same amount. However, it is obvious that this answer is incorrect.
Most often, in SQL, distinct is used with the Count function. So, without difficulty we can find out how many unique types of wallpapers we generally have:
SELECT count (distinct Oboi.type) FROM Oboi |
And to get the result 5 - paper ordinary and two-layer, vinyl, fabric and non-woven. Surely everyone saw an advertisement of the type: โOnly we have more than 20 types of different wallpapers!โ, Which means that in this store there are not a couple of dozen rolls of everything, but wallpapers of the most diverse modern types.
Interestingly, in a single query, you can specify several Count functions with or without the distinct attribute. That is, this is the only situation where distinct in Select can be present several times.
When to abandon the argument
The use of the SQL distinct argument should be discarded in one of two cases:
- You select from tables and are sure of the uniqueness of the values โโin each. In this case, the use of the argument is impractical, because this is an additional load on the server or client (depending on the type of DBMS).
- You are afraid to lose the necessary data. Letโs explain.
Suppose a boss asks you to list the wallpapers that you have, with just two columns - type and color. Out of habit, you specify the distinct argument:
SELECT distinct Oboi.type, Oboi.color FROM Oboi ORDER BY Oboi.type |
And - lose some of the data:
type | color |
Paper | Multicolor |
Two-layer paper | Beige |
Vinyl | Brown |
Vinyl | Orange |
Fabric | Pink |
Non-woven | Beige |
Non-woven | Whites |
It may seem that we have only one type of paper wallpaper (ordinary and two-layer), although in fact even in our small table there are two article numbers (the result is not distinct):
type | color |
Paper | Multicolor |
Paper | Multicolor |
Two-layer paper | Beige |
Two-layer paper | Beige |
Vinyl | Brown |
Vinyl | Orange |
Fabric | Pink |
Non-woven | Whites |
Non-woven | Beige |
Therefore, as when writing any request, with the distinct argument, you must be careful and correctly resolve the issue with its application, depending on the task.
Distinct alternative
The opposite of distinct is the All argument. When applied, duplicate lines are saved. But since by default the DBMS thinks that all values โโshould be output, then the argument All is more of a qualifier than a real functional argument.
We hope you now understand when distinct (SQL) is applied. The description gave you complete information about the appropriateness of applying this argument in solving various problems. Indeed, as it turned out, even such a simple argument in its application hides the quite tangible probability of losing some data and displaying inaccurate information.