A Subquery is a SQL query nested inside a SQL query. Very useful to create a virtual table usable by the main query.
SELECT column(s) FROM table1 WHERE value IN (SELECT column_name FROM tables2 WHERE conditions);
Joins
Relational databases are defined with tables or entities such Employee and Department. To create a link between the two tables a column is defined as Department_ID in both tables. Now, if you would like to extract employee names and departments names you need to SQL JOIN.
There are four main type of joins:
JOIN
Action
INNER JOIN
return records that have matching values in both tables
LEFT JOIN
return all records from table1 (LEFT table1) and the matched records from table2
RIGHT JOIN
return all records from table2 (RIGHT table1) and the matched records from table1.
FULL JOIN()
return all rows from both tables
Syntax:
SELECT table1.col_name, table2.col_name FROM table1 INNER JOINtable2ONtable1.col_name = table2.col_name;
OR
SELECT table1.col_name, table2.col_name FROM table1 LEFT JOINtable2ONtable1.col_name = table2.col_name;
OR
SELECT table1.col_name, table2.col_name FROM table1 RIGHT JOINtable2ONtable1.col_name = table2.col_name;
OR
SELECT table1.col_name, table2.col_name FROM table1 FULL JOINtable2ONtable1.col_name = table2.col_name;
Example: Pilot Certification Data
Data was obtained from the Federation Aviation Administration (FAA) in June 2023 on pilot certification records and contained the following:
Pilot ID,
CertLevel: the certification level (Airline, Commercial, Student, Sport, Private, and Recreational),
STATE: the USA state,
MedClass: the medical class,
MedExpMonth: the medical expire month, and
MedExpYear: the medical expire year.
R offers packages that allow the user to run SQL queries to query data frame like they would query a database table. We will use the sqldf package with the function sqldf().
library(sqldf)
Loading required package: gsubfn
Loading required package: proto
Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
Referenced from: <5D128DE5-8E3C-3CF8-9A4F-8D762BB79C4E> /Library/Frameworks/R.framework/Versions/4.2/Resources/modules/R_X11.so
Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/X11/lib/libSM.6.dylib' (no such file), '/opt/X11/lib/libSM.6.dylib' (no such file), '/Library/Frameworks/R.framework/Resources/lib/libSM.6.dylib' (no such file), '/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server/libSM.6.dylib' (no such file)
tcltk DLL is linked to '/opt/X11/lib/libX11.6.dylib'
Could not load tcltk. Will use slower R code instead.
Loading required package: RSQLite
pilots =read.csv(file ="../datasets/pilotsCertFAA2023.csv")# Counts how many pilots in each stateres_query1 =sqldf("SELECT STATE, counts FROM (SELECT STATE, ID,COUNT(ID) counts FROM pilots GROUP BY STATE)")# show the first 6 data pointshead(res_query1)
STATE counts
1 <NA> 20586
2 AA 5
3 AE 200
4 AK 6377
5 AL 5931
6 AP 291
# Counts how many pilots in each state and rank them from highest to lowestres_query2 =sqldf("SELECT STATE, counts FROM (SELECT STATE, ID,COUNT(ID) counts FROM pilots GROUP BY STATE)ORDER by counts desc")
🛎 🎙️ Recordings on Canvas will cover more details and examples! Have fun learning and coding 😃! Let me know how I can help!