SQL basics
What is SQL?¶
SQL (Structured Query Language) is a standard language for storing, manipulating and retrieving data in databases. Databases are essentially Excel-spreadsheets on steroids. A database contains a collection of tables. Such tables can be logically grouped together in separate workspaces, called 'schemas'. Each table has a set of rows and columns. Data in such tables can have a relationship to other tables, by referencing columns in the related table. Database tables can be truly huge, containing millions of rows and being used by thousands of users and processes at the same time. All database users continuously execute SQL queries, specific requests to the database, to select, update, delete or insert their data.
Example
This guide shows example queries on a sample table called building, located in a schema called sandbox in which each row is a building.
| id | dc_id | id_src | construction_year | status |
|---|---|---|---|---|
| 1 | 25 | s83497fs | 2014 | in_use |
| 2 | 25 | 59s74146 | 1906 | demolished |
| 3 | 91 | 897sf5mf | 1985 | under_renovation |
| 4 | 91 | d7ii585s | 1994 | in_use |
Common query types¶
There are four basic query types: select, update, insert and delete.
In addition, there are also several query-types for tasks such as creating new database tables, deleting or modifying tables etcetera.
An SQL query always starts with the name of the operation.
| Type | Meaning |
|---|---|
select |
Retrieve data |
update |
Update existing data |
insert into |
Insert new data (into a particular table) |
delete |
Delete existing data |
create table |
Create a new table |
drop table |
Delete a table |
alter table |
Modify a table |
Basic select-query syntax¶
To retrieve data, we use a select query. Such query has the following basic syntax:
select column1, column2
from schema_name.table_name
The query lists the names of all columns that should be retrieved, separated with a comma.
In case you want to retrieve all columns of a particular table, you can use * instead of specifying each column name separately.
Example
An example SQL query is:
select * from sandbox.building
building, which is located in the sandbox schema.
Filtering¶
Imagine that we don't want all rows of a table, but only the rows that we are interested in. In such case we add a so-called 'where-clause' to the query, which specifies how we want to filter our results.
select column1, column2, ...
from schema_name.table_name
where filter1 and|or filter2 and|or ...
Note that multiple filters can be applied at the same time, by separating them with and and or keywords.
Example
An example of an SQL select-query with filtering is:
select * from sandbox.building
where construction_year > 2000 and status = 'in_use'
You can use all standard SQL operators as filter, such as =, >, <, >=, <=, <>, like and in.
For a full reference, check the W3 Schools SQL tutorial
Ordering¶
Normally, a query returns the results in random ordering. However, it is possible to sort the results in a particular order.
select column1, column2, ...
from schema_name.table_name
order by column1, column2, ... desc|asc
order by keyword is used to sort the result-set in ascending or descending order.
Example
An example of an SQL select-query with ordering is:
select * from sandbox.building
order by construction_year desc
Note
In case ordering is not necessary per se, it is recommended to not specify it in the query, as it takes time to order the results (resulting in a slower query).
Limiting¶
The limit keyword specifies the number of table rows to return. This is useful for queries on large tables with
thousands of records, since returning the entire table may return many GB of data and take a lot of time to execute.
It is therefore a smart strategy to retrieve the data in smaller batches of several thousand rows at once.
select column1, column2, ...
from schema_name.table_name
limit ...
Example
Selecting a small batch of rows from a table:
select * from sandbox.building
limit 300
Functions¶
One powerful feature of databases is that they can also apply functions to the table data.
Such functions are for example min() (minimum value of a column), max() (maximum value of a column),
count() (number of rows), avg() (average of a column) and sum() (sum of a column).
select function(column1), ...
from schema_name.table_name
Example
Return the average building construction year:
select avg(construction_year) from sandbox.building
Note
It is also possible to execute spatial functions. Those always have the prefix 'st_'.
Examples of common spatial functions are st_intersect(), st_area(), st_within() and st_distance().
For example, we can calculate the surface area for any table column with a geometry-datatype.
select st_area(geometry)
from schema_name.table_name
Combining query elements¶
It is common to combine multiple elements in a single query, in order to get specific data calculations. Note that, when combining elements, the order matters!
select column1, function(column2), ...
from schema_name.table_name
where ...
order by ...
limit ...
Example
Selecting the 10 newest buildings that are already in use:
select * from sandbox.building
where status = 'in_use'
order by construction_year desc
limit 10
Creating a new table¶
The create table statement creates a new (empty) table in the database.
create table schema_name.table_name (
column1 datatype1,
column2 datatype2,
column3 datatype3,
...
)
Several datatypes are supported, such as text (strings), float (numbers), int (integers), geometry (spatial geometry),
timestamptz (timezone-aware timestamps), etcetera.
Example
Create a table called calculation_result, containing the average temperature and rainfall for several cities:
create table sandbox_calculation.calculation_result (
id int,
city_name text,
avg_temperature float,
avg_rainfall float
)
Inserting data¶
Fill a table with data with the insert into statement.
insert into schema_name.table_name
values (value1, value2, value3)
Example
Insert calculation output values into a table called calculation_result, containing the average temperature and rainfall for several cities:
insert into sandbox_calculation.calculation_result
values (1, 'New York', 18.7, 1654.5)
Advanced queries and relation to Nexus¶
SQL contains a lot of other powerful features that have not been covered in this basic guide, such as table joins, conditional statements and specialized text-, geometrical and mathematical functions and operations. Nexus uses PostgreSQL as underlying database system and PostGIS as spatial extension, which means that a whole range of additional functions and operators on top of standard SQL is available for use.
For a full reference on the possibilities, check the following resources:
- W3 Schools SQL tutorial for a detailed explanation of basic SQL features
- PostgreSQL reference for an overview of all additional PostgreSQL-specific features available in Nexus
- PostGIS reference for an overview of all additional PostGIS spatial functions available in Nexus