Advanced SQL

At the end of this week, you will be able to:

Advanced SQL commands

SQL commands to return all rows from two tables:

SELECT column(s)
FROM table1
UNION ALL
SELECT column(s)
FROM table2;

SQL commands to return only rows that exist in both tables:

SELECT column(s)
FROM table1
INTERSECT
SELECT column(s)
FROM table2;

SQL commands to return all rows in the first SELECT but excludes those by the second SELECT:

SELECT col1,col2,...
FROM table1
EXCEPT
SELECT col1,col2,...
FROM table2;

SQL command to specify the number of records to return:

SELECTTOPnumber | percent column_names(s) <br/>FROM table_name;`

Subqueries

A Subquery is a SQL query nested inside a SQL query. Very useful to create a virtual table usable by the main query.

SELECT column(s)
FROM table1
WHERE value IN
(SELECT column_name
FROM tables2
WHERE conditions);

Joins

Relational databases are defined with tables or entities such Employee and Department. To create a link between the two tables a column is defined as Department_ID in both tables. Now, if you would like to extract employee names and departments names you need to SQL JOIN.

There are four main type of joins:

JOIN Action
INNER JOIN return records that have matching values in both tables
LEFT JOIN return all records from table1 (LEFT table1) and the matched records from table2
RIGHT JOIN return all records from table2 (RIGHT table1) and the matched records from table1.
FULL JOIN() return all rows from both tables

Syntax:

SELECT table1.col_name, table2.col_name
FROM table1
INNER JOIN table2ON table1.col_name = table2.col_name;

OR

SELECT table1.col_name, table2.col_name
FROM table1
LEFT JOIN table2ON table1.col_name = table2.col_name;

OR

SELECT table1.col_name, table2.col_name
FROM table1
RIGHT JOIN table2ON table1.col_name = table2.col_name;

OR

SELECT table1.col_name, table2.col_name
FROM table1
FULL JOIN table2ON table1.col_name = table2.col_name;

Example: Pilot Certification Data

Data was obtained from the Federation Aviation Administration (FAA) in June 2023 on pilot certification records and contained the following:

  • Pilot ID,

  • CertLevel: the certification level (Airline, Commercial, Student, Sport, Private, and Recreational),

  • STATE: the USA state,

  • MedClass: the medical class,

  • MedExpMonth: the medical expire month, and

  • MedExpYear: the medical expire year.

R offers packages that allow the user to run SQL queries to query data frame like they would query a database table. We will use the sqldf package with the function sqldf().

library(sqldf)
Loading required package: gsubfn
Loading required package: proto
Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
  dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
  Referenced from: <5D128DE5-8E3C-3CF8-9A4F-8D762BB79C4E> /Library/Frameworks/R.framework/Versions/4.2/Resources/modules/R_X11.so
  Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/X11/lib/libSM.6.dylib' (no such file), '/opt/X11/lib/libSM.6.dylib' (no such file), '/Library/Frameworks/R.framework/Resources/lib/libSM.6.dylib' (no such file), '/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server/libSM.6.dylib' (no such file)
tcltk DLL is linked to '/opt/X11/lib/libX11.6.dylib'
Could not load tcltk.  Will use slower R code instead.
Loading required package: RSQLite
pilots = read.csv(file = "../datasets/pilotsCertFAA2023.csv")

# Counts how many pilots in each state
res_query1 = sqldf("SELECT STATE, counts 
FROM (SELECT STATE, ID,COUNT(ID) counts 
FROM pilots 
GROUP BY STATE)")

# show the first 6 data points
head(res_query1)
  STATE counts
1  <NA>  20586
2    AA      5
3    AE    200
4    AK   6377
5    AL   5931
6    AP    291
# Counts how many pilots in each state and rank them from highest to lowest

res_query2 = sqldf("SELECT STATE, counts 
FROM (SELECT STATE, ID,COUNT(ID) counts 
FROM pilots 
GROUP BY STATE)
ORDER by counts desc")

🛎 🎙️ Recordings on Canvas will cover more details and examples! Have fun learning and coding 😃! Let me know how I can help!

📚 👈 Assignments - Advanced SQL

Instructions are posted on Canvas.