For a simple example on summarizing data, please have a look at Part 4 of this series.
This post is based on the Udemy.com video series by Philip Burton on Querying SQL Server for the purpose of passing the 70-461 exam. His series has 6 parts. He discusses summarizing data in part 2. We start with a table called [tblEmployee] with several columns of data in it. One of those columns in [DateOfBirth] with a data type of date. Summarizing data involves aggregate functions and GROUP BY. Let’s start with Philip’s simple example.
select * from [dbo].[tblEmployee] where DateOfBirth >= '19760101' and DateOfBirth < '19870101'
In the above query we are not summarizing yet. Here we are filtering. We do not want all of the rows, but only those rows that meet a certain criteria. Notice that the data of birth is less than the first day of the next year. In the next query we get all of the rows because there is no WHERE clause.
select year(DateOfBirth) as YearDateOfBirth from tblEmployee
In the next query we want to count the number of employees born in a each year. We want to show the year and how many employees were born in that year, not all of the columns.
select year(DateOfBirth) as YearDateOfBirth, Count(*) as NumberBorn from tblEmployee
You get an error with the above query. The error is shown below.
Msg 8120, Level 16, State 1, Line 1 Column 'tblEmployee.DateOfBirth' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
We can fix this by putting the DateOfBirth in a GROUP BY clause as shown below. As soon as we start using an aggregate function (count, sum, average and so on) we need to expand our query to include group by. I have added a WHERE clause and an ORDER BY clause just to show how the query is built and where the parts fit together.
select year(DateOfBirth) as YearDateOfBirth, Count(*) as NumberBorn from tblEmployee where 1=1 group by year(DateOfBirth) order by year(DateOfBirth)
How SQL Server Processes
The above query is processed in a different order than the way it is written above. What order is that? Start with the entire table tblEmployee. Finish with the select and order by. One reason this is important to understand is that you cannot use your column name YearDateOfBirth in the group by clause. The computer doesn’t understand what you mean because it hasn’t even got to the select clause yet.
- from tblEmployee
- where 1=1
- group by year(DateOfBirth)
- select year(DateOfBirth) as YearDateOfBirth, Count(*) as NumberBorn
- order by year(DateOfBirth)
You cannot use aliases in the group by clause.
Without the order by clause, you don’t know what order you’ll find it in. That unknown order is non-deterministic. In will likely not be in the exact same order each time you run it. It might be in the same order each time if yo are running it on one computer with one hard drive.