A Visual Introduction To SQL Joins
A Visual Introduction To SQL Joins
Coders' Corner
Paper 72-27
Abstract
Real systems rarely store all their data in one large table. To do so would require maintaining several duplicate copies of the same values and could threaten the integrity of the data. Instead, IT departments everywhere almost always divide their data among several different tables. Because of this, a method is needed to simultaneously access two or more tables to help answer the interesting questions about our data. This paper visually illustrates how a join process works and then shows how an SQL query is constructed so some or all of the specified tables contents can be brought together.
Joins are specified on a minimum of two tables at a time, where a column from each table is used for the purpose of connecting the two tables. Connecting columns should have "like" values and the same datatype attributes since the join's success is dependent on these values.
Example Tables
A relational database is simply a collection of tables. Each table contains one or more columns and one or more rows of data. The examples presented in this paper apply an example database consisting of three tables: CUSTOMERS, MOVIES, and ACTORS. Each table appears below. CUSTOMERS CUST_NO NAME 11321 John Smith 44555 Alice Jones 21713 Ryan Adams MOVIES CUST_NO 44555 21713 44555 37753 ACTORS MOVIE_NO 1011 2198 3090 LEAD_ACTOR Mel Gibson Clint Eastwood Sylvester Stallone MOVIE_NO 1011 3090 2198 4456 RATING PG-13 G G PG CATEGORY Adventure Comedy Comedy Suspense CITY Miami Baltimore Atlanta STATE FL MD GA
Introduction
Joining two or more tables of data is a powerful feature found in the relational model and the SQL procedure. Information in a database system is rarely stored in a single table because it would result in the duplication of data values. A duplicated data value is not only inefficient, but also makes for more complex queries and updates. As a result, data is split between two or more tables. The SQL procedure is a simple and flexible tool for joining tables of data together. This paper presents the importance of joins, how joins are performed without a WHERE clause, with a WHERE clause, using table aliases, and with three tables of data. Certainly many of these techniques can be accomplished using other methods, but the simplicity and flexibility found in the SQL procedure makes it especially interesting, if not indispensable, as a tool for the information practitioner.
SQL Joins
A join of two or more tables provides a means of gathering and manipulating data in a single SELECT statement. A "JOIN" statement does not exist in the SQL language. The way two or more tables are joined is to specify the tables names in a WHERE clause of a SELECT statement. A comma separates each table specified in an inner join.
The following SQL code references a join on two tables with CUST_NO specified as the connecting column.
SUGI 27
Coders' Corner
PROC SQL; SELECT * FROM CUSTOMERS, MOVIES WHERE CUSTOMERS.CUST_NO = MOVIES.CUST_NO; QUIT; In this example, tables CUSTOMERS and MOVIES are used. Each table has a common column, CUST_NO which is used to connect rows together from each when the value of CUST_NO is equal, as specified in the WHERE clause. A WHERE clause restricts what rows of data will be included in the resulting join.
The following SQL code illustrates a join on two tables with MOVIE_NO specified as the connecting column. The table aliases are specified in the SELECT statement as qualified names, the FROM clause, and the WHERE clause. PROC SQL; SELECT M.MOVIE_NO, M.RATING, A.LEADING_ACTOR FROM MOVIES M, ACTORS A WHERE M.MOVIE_NO = A.MOVIE_NO; QUIT;
To inspect the results of a Cartesian Product, you could submit the same code as before but without the WHERE clause. PROC SQL; SELECT * FROM CUSTOMERS, MOVIES; QUIT;
Table Aliases
Table aliases provide a "short-cut" way to reference one or more tables within a join operation. One or more aliases are specified so columns can be selected with a minimal number of keystrokes. To illustrate how table aliases in a join works, a two-table join is linked in the following diagram. MOVIES M Cust_no Movie_no Rating Category ACTORS A Movie_no Lead_actor
SUGI 27
Coders' Corner
Conclusion
The SQL procedure provides a powerful way to join two or more tables of data. It's easy to learn and use. More importantly, since the SQL procedure follows ANSI (American National Standards Institute) guidelines, your knowledge is portable to other platforms and vendor implementations. The simplicity and flexibility of performing joins with the SQL procedure makes it an especially interesting, if not indispensable, tool for the information practitioner.
Trademark Citations
SAS, SAS Alliance Partner, and SAS Certified Professional are registered trademarks of SAS Institute Inc. in the USA and other countries. The symbol indicates USA registration.
Acknowledgments
I would like to thank Richard Livornese of TIER (CoChair Coders Corner) and Duke Owen of Westat (CoChair Coders Corner) for accepting my abstract and paper, as well as the SUGI Leadership for their support of a great Conference.
Kirk is a SAS Alliance Partner and SAS Certified Professional with 25 years of SAS software experience. He has written over one hundred articles for professional journals and SAS User Group proceedings. His popular SAS Tips column appears regularly in the SANDS and SESUG Newsletters. His expertise includes application design and development, training, and programming using baseSAS, SQL, ODS, SAS/FSP, SAS/AF, SCL, FRAME, and SAS/EIS software.
References
Lafler, Kirk. Ten Great Reasons to Learn the SQL Procedure, SAS Users Group International, 1999. SAS Guide to the SQL Procedure: Usage and Reference, Version 6, First Edition; SAS Institute, Cary, NC, U.S.A.
Comments and suggestions can be sent to: Kirk Paul Lafler Software Intelligence Corporation P.O. Box 1390 Spring Valley, California 91979-1390 E-mail: [email protected] http://www.software-intelligence.com Voice: 619.660.2400