forked from TheDataMine/the-examples-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03-sql.Rmd
982 lines (737 loc) · 33.1 KB
/
03-sql.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
# SQL {#sql}
```{r, include=F}
# library(RMariaDB)
library(RSQLite)
library(DBI)
# Establish a connection to sqlite databases
chinook <- dbConnect(RSQLite::SQLite(), "chinook.db")
lahman <- dbConnect(RSQLite::SQLite(), "lahman.db")
# Establish a connection to mysql databases
# connection <- dbConnect(RMariaDB::MariaDB(),
# host="scholar-db.rcac.purdue.edu",
# db="elections",
# user="elections_user",
# password="Dataelect!98")
```
```{r, eval=F}
library(RMariaDB)
library(RSQLite)
library(DBI)
# Establish a connection to sqlite databases
chinook <- dbConnect(RSQLite::SQLite(), "chinook.db")
lahman <- dbConnect(RSQLite::SQLite(), "lahman.db")
# Establish a connection to mysql databases
connection <- dbConnect(RMariaDB::MariaDB(),
host="your-host.com",
db="your-database-name",
user="your-username",
password="your-password")
```
## Joins {#sql-joins}
Joins are SQL clauses that combine data from two tables. There are 4 primary types of SQL joins: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN.
When talking about an SQL JOIN statement, sometimes the first table in the SQL statement is referred to as the "left" table, and the second table is referred to as the right table. For instance, in the following query, `A` is the left table and `B` is the right table.
```{sql, eval=F}
SELECT * FROM A INNER JOIN B ON A.id=B.a_id;
```
While there can be cases where using RIGHT JOIN and FULL JOIN can make your SQL statement more concise, both RIGHT JOIN and FULL JOIN are redundant and can be fully emulated using LEFT JOIN and UNION ALL clauses.
For the purposes of illustration, we will be using a common example of a database for an online store. This online store has two primary tables, `orders` and `customers`, shown below.
**orders**
|id|description|customer_id|value|
|--|-----------|-----------|----:|
|1|Water bottle|1|15.00|
|2|Key chain|1|7.50|
|3|Computer|3|2000.00|
|4|Thumb drive|3|25.00|
|5|Notebook|4|9.00|
|6|Shampoo||5.00|
|7|Paper||4.00|
**customers**
|id|first_name|last_name|email|
|--|----------|---------|-----|
|1|Natalie|Wright|[email protected]|
|2|Ana|Sousa|[email protected]|
|3|Ben|Schwartz|[email protected]|
|4|Chen|Xi|[email protected]|
|5|Frank|Zhang|[email protected]|
|6|Tianchi|Liu|[email protected]|
|7|Jake|Jons|[email protected]|
### INNER JOIN {#sql-inner-join}
An INNER JOIN, often referred to as simply JOIN, returns rows/records where there is a match in the right table from the left table. Records from the left table that don't have a match in the right table are excluded. Records from the right table that don't have a match in the left table are also excluded.
This is appropriate any time you need data from two separate tables, but only when the two tables have something in common. For example, what if our online company decided it wanted to query the database to send an email of appreciation for all customers who have placed at least 1 order. In this case, we want _only_ the emails of those who don't appear in _both_ the **customers** and **orders** table.
```{sql, eval=F}
SELECT customers.email FROM orders INNER JOIN customers ON orders.customer_id=customers.id;
```
Which would result in the following table.
|email|
|-----|
### LEFT OUTER JOIN {#sql-left-join}
A LEFT OUTER JOIN, often referred to as simply a LEFT JOIN, returns rows/records where every value in the left table is present in addition to additional data from the right table, when there exists a match in the right table.
This is appropriate any time you want all of the data from the left table, and any extra data from the right table if there happens to be a match. For example, what if our online company wanted a list of all orders placed, and if the order wasn't placed from a guest account, send an email to the customer thanking them for their purchase? In this case, it would make sense to append email information to the order when there is a match.
```{sql, eval=F}
SELECT orders.description, orders.value, customers.email FROM orders LEFT JOIN customers ON order.customer_id=customers.id;
```
Which would result in the following table, enabling the employee to see orders as well as send out thank you emails.
|description|value|first_name|last_name|email|
|-----------|----:|
|Water bottle|15.00|Natalie|Wright|[email protected]|
|Key chain|7.50|Natalie|Wright|[email protected]|
|Computer|2000.00|Ben|Schwartz|[email protected]|
|Thumb drive|25.00|Ben|Schwartz|[email protected]|
|Notebook|9.00|Chen|Xi|[email protected]|
|Shampoo|5.00||||
|Paper|4.00||||
Had we instead used an INNER JOIN, our list would be missing critical order information.
```{sql, eval=F}
SELECT orders.description, orders.value, customers.email FROM orders INNER JOIN customers ON order.customer_id=customers.id;
```
|description|value|first_name|last_name|email|
|-----------|----:|
|Water bottle|15.00|Natalie|Wright|[email protected]|
|Key chain|7.50|Natalie|Wright|[email protected]|
|Computer|2000.00|Ben|Schwartz|[email protected]|
|Thumb drive|25.00|Ben|Schwartz|[email protected]|
|Notebook|9.00|Chen|Xi|[email protected]|
## Aliasing
Aliasing is the process of giving a table or a table column a temporary name. Aliases are commonly used to either make the query easier to write, or more readable. An example of using table aliases to make a query shorter would be the following.
```{sql, eval=F}
SELECT orders.description, orders.value, customers.email FROM orders INNER JOIN customers ON order.customer_id=customers.id;
```
By using table aliases, this can be reduced greatly.
```{sql, eval=F}
SELECT o.description, o.value, c.email FROM orders AS o INNER JOIN customers AS c ON o.customer_id=c.id;
```
Note that aliases only last for the duration of a single query. If we were to subsequently use the following query, it would fail.
```{sql, eval=F}
SELECT o.description, o.value, c.email FROM o INNER JOIN c ON o.customer_id=c.id;
```
In addition to table aliases, we can give fields aliases as well. For example, we could reduce `customer_id` to just `c_id`.
```{sql, eval=F}
SELECT orders.customer_id AS c_id FROM orders INNER JOIN customers ON order.c_id=customers.id;
```
Alternatively, we could change `customer_id` to `Customer ID`, however, whenever we want an alias to contain spaces, we need to use either double quotes or square brackets.
```{sql, eval=F}
SELECT orders.customer_id AS "Customer ID" FROM orders INNER JOIN customers ON order."Customer ID"=customers.id;
```
### RDBMS {#sql-rdbms}
### SQL in R {#sql-in-r}
<iframe id="kaltura_player" src="https://cdnapisec.kaltura.com/p/983291/sp/98329100/embedIframeJs/uiconf_id/29134031/partner_id/983291?iframeembed=true&playerId=kaltura_player&entry_id=1_q1yr0jmc&flashvars[streamerType]=auto&flashvars[localizationCode]=en&flashvars[leadWithHTML5]=true&flashvars[sideBarContainer.plugin]=true&flashvars[sideBarContainer.position]=left&flashvars[sideBarContainer.clickToClose]=true&flashvars[chapters.plugin]=true&flashvars[chapters.layout]=vertical&flashvars[chapters.thumbnailRotator]=false&flashvars[streamSelector.plugin]=true&flashvars[EmbedPlayer.SpinnerTarget]=videoHolder&flashvars[dualScreen.plugin]=true&flashvars[Kaltura.addCrossoriginToIframe]=true&&wid=1_52r9px1c" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" sandbox="allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation" frameborder="0" title="Kaltura Player"></iframe>
#### Examples {#sql-in-r-examples}
Please see [here](https://raw.githubusercontent.com/TheDataMine/the-examples-book/master/files/think-summer-examples-2020.pdf) for a variety of examples demonstrating using SQL within R.
### SQL in Python {#sql-in-python}
### Examples {#sql-examples}
The following examples use the `lahman.db` sqlite database.
#### Display the first 10 ballparks in the `ballparks` table.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT * FROM parks LIMIT 10;
```
</details>
#### Make a list of the names of all of the inactive teams in baseball history. {#sql-where}
<details>
<summary>Click here for solution</summary>
Remove the LIMIT 10 for full results.
```{sql, connection=lahman}
SELECT franchName FROM teamsfranchises WHERE active=='N' LIMIT 10;
```
</details>
#### Find the player with the most Runs Batted In (RBIs) in a season in queries. In the first query find the playerID of the player with the most RBIs. In the second query find the player's name in the `people` table.
<details>
<summary>Click here for solution</summary>
In addition to his RBI record, Hack Wilson also held the NL home run record for a long time as well with 56. In 1999, Manny Ramirez tried to pursue the RBI record, but only was able to accrue 165 RBIs.
```{sql, connection=lahman}
-- Find the playerID
SELECT playerID FROM batting WHERE RBI==191;
-- Display the name
SELECT nameFirst, nameLast FROM people WHERE playerID=='wilsoha01';
```
</details>
#### Who was the manager of the 1976 "Big Red Machine" (CIN)? Complete this in 2 queries.
<details>
<summary>Click here for solution</summary>
The "Big Red Machine" was a famous nickname for the dominant Cincinnati Reds of the early 1970s. Many of its team members are Hall of Famers, including their manager, Sparky Anderson.
```{sql, connection=lahman}
SELECT playerID FROM managers
WHERE yearID==1976 AND teamID=='CIN';
SELECT nameFirst, nameLast FROM people
WHERE playerID=='andersp01';
```
</details>
#### Make a list of the teamIDs that were managed by Tony LaRussa. Complete this in 2 queries.
<details>
<summary>Click here for solution</summary>
Tony LaRussa is very well known for being a manager that was involved in baseball for a very long time. He won the World Series with the St. Louis Cardinals and the Oakland Athletics.
```{sql, connection=lahman}
SELECT playerID FROM people WHERE nameLast=='LaRussa' AND nameFirst=='Tony';
SELECT DISTINCT teamID FROM managers WHERE playerID=='larusto01';
```
</details>
#### What was Cecil Fielder's salary in 1987? Display the teamID with the salary.
<details>
<summary>Click here for solution</summary>
Cecil Fielder was a power hitting DH in the 1980s and 1990s. His son, Prince Fielder, played in the major leagues as well.
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Cecil' AND nameLast=='Fielder';
SELECT teamID, salary FROM salaries
WHERE playerID=='fieldce01' AND yearID==1987;
```
</details>
#### Make a list of all the teams who have lost a World Series (WS) since 1990. Put the list in ascending order by `yearID`. {#sql-order-by}
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT teamIDloser, yearID FROM seriespost
WHERE yearID >= 1990 AND round=='WS'
ORDER BY yearID ASC LIMIT 10;
```
</details>
#### Let's find out about Cal Ripken, Jr. What was his height and weight? Did he bat right or left handed? When did he play his final game? Find all of this information in one query.
<details>
<summary>Click here for solution</summary>
Cal Ripken, Jr's nickname is the "Iron Man" of baseball due to the fact that he started in 2,632 straight games. That means in just over 16 seasons, Cal Ripken, Jr. never missed a game!
```{sql, connection=lahman}
SELECT height, weight, bats, finalgame FROM people
WHERE nameFirst=='Cal' AND nameLast=='Ripken'
AND deathState IS NULL;
```
</details>
#### Select all the playerIDs and yearIDs of the players who were inducted in the hall of fame and voted in by the Veterans committee, between 1990 and 2000. Put the list in descending order. {#sql-between}
<details>
<summary>Click here for solution</summary>
The veterans committee in the Hall of Fame voting process place players in the Hall of Fame that are forgotten by the writers, fans, etc. This is a way for players to recognize who they think were the greatest players of all time, or are skipped over for a variety of reasons. This is one reason why there is a lot of scrutiny in the process for how players are selected to the baseball hall of fame.
```{sql, connection=lahman}
SELECT playerID, yearID FROM halloffame
WHERE votedBy=='Veterans' AND inducted=='Y'
AND yearID BETWEEN 1990 AND 2000
ORDER BY yearID DESC LIMIT 10;
```
</details>
#### Get a list of the attendance by season of the Toronto Blue Jays (TOR). What season was the highest attendance?
<details>
<summary>Click here for solution</summary>
The Toronto Blue Jays were the 1993 season's World Series champion. This means that, yes, a non-USA team has won the World Series for baseball!
```{sql, connection=lahman}
SELECT yearkey, attendance FROM homegames
WHERE teamkey=='TOR'
ORDER BY attendance DESC LIMIT 10;
```
</details>
#### How many different leagues have represented Major League Baseball over time? {#sql-distinct}
<details>
<summary>Click here for solution</summary>
Major League Baseball has had several leagues that have been represented in its history. There are only two current leagues: National League and the American League.
```{sql, connection=lahman}
SELECT DISTINCT league FROM leagues;
```
</details>
#### Find the teams that have won the World Series.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT teamID, yearID FROM teams WHERE WSWin=='Y' LIMIT 10;
```
</details>
#### List the top 10 season win totals of teams. Include the `yearID` and `teamID`.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT teamID, yearID, W FROM teams ORDER BY W DESC LIMIT 10;
```
</details>
#### List the pitchers with their `teamID`, wins (`W`), and losses (`L`) that threw complete games (`CG`) in the 1995 season. Include their number of complete games as well.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID, teamID, W, L, CG FROM pitching
WHERE CG > 0 AND yearID==1995
ORDER BY W DESC LIMIT 10;
```
</details>
#### Get a printout of the Hits (`H`), and home runs (`HR`) of Ichiro Suzuki's career. Do this is in two queries. In the first query, find Ichiro Suzuki's `playerID`. In the second one list the `teamID`, `yearID`, hits and home runs.
<details>
<summary>Click here for solution</summary>
Ichiro Suzuki is regarded as one of the greatest hitters of all time because of his prowess in both American and Japanese professional baseball.
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Ichiro' AND nameLast=='Suzuki';
SELECT teamID, yearID, H, HR FROM batting
WHERE playerID=='suzukic01';
```
</details>
#### How many walks (`BB`) and strikeouts (`SO`) did Mariano Rivera achieve in the playoffs? Which year did Mariano Rivera give up the most post-season walks?
<details>
<summary>Click here for solution</summary>
More men have walked on the moon than have scored a run on Mariano Rivera in a playoff game. Mariano Rivera made the hall of fame in 2019.
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Mariano' AND nameLast=='Rivera';
SELECT yearID, teamID, BB, SO FROM pitchingpost
WHERE playerID=='riverma01'
ORDER BY BB DESC;
```
</details>
#### Find the pitcher with most strikeouts (`SO`), and the batter that struck out the most in the 2014 season. Get the first and last name of the pitcher and batter, respectively.
<details>
<summary>Click here for solution</summary>
Corey Kluber is a two-time AL Cy Young winner. He is well known for his two-seam fastball that is difficult to hit.
```{sql, connection=lahman}
SELECT playerID, SO FROM pitching
WHERE yearID==2014
ORDER BY SO DESC
LIMIT(10);
SELECT playerID, SO FROM batting
WHERE yearID==2014
ORDER BY SO DESC
LIMIT(10);
SELECT nameFirst,nameLast FROM people
WHERE playerID=="klubeco01" OR playerID=="howarry01";
```
</details>
#### How many different teams did Bartolo Colon pitch for?
<details>
<summary>Click here for solution</summary>
Bartolo Colon is a well-known journeyman pitcher in baseball. He has pitched with a lot of teams, but it wasn't until he played for the New York Mets when he needed to come to the plate. He had a weird batting stance that is funny to watch. He even [hit a home run](https://www.youtube.com/watch?v=OVFsq9FQBlc) one season!
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Bartolo' AND nameLast=='Colon';
SELECT DISTINCT teamID FROM pitching
WHERE playerID=='colonba01';
```
</details>
#### How many times did Trevor Bauer come to bat (`AB`) in 2016? How many hits (`H`) did he get?
<details>
<summary>Click here for solution</summary>
Trevor Bauer is much more known for his pitching than he is known for hitting. This is common for pitchers, as many are not very good at hitting.
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=="Trevor" AND nameLast=="Bauer";
```
```{sql, connection=lahman}
SELECT AB, H FROM batting
WHERE playerID=="bauertr01" AND yearID=="2016";
```
</details>
#### Let's compare Mike Trout and Giancarlo Stanton by season. Who has hit more RBIs in a season? Who has been caught stealing (`CS`) more in a season?
<details>
<summary>Click here for solution</summary>
Mike Trout and Giancarlo Stanton are considered two of the of the best hitters in Major League Baseball for very different reasons. Trout is an all-around player known for being indispensible, where Stanton is known as a power hitter.
```{sql, connection=lahman}
SELECT playerID, nameFirst, nameLast FROM people
WHERE (nameFirst=='Giancarlo' AND nameLast=='Stanton')
OR (nameFirst=='Mike' AND nameLast=='Trout');
```
```{sql, connection=lahman}
SELECT playerID, yearID, teamID, RBI, CS FROM batting
WHERE playerID=='stantmi03' OR playerID=='troutmi01'
ORDER BY RBI DESC LIMIT 1;
```
```{sql, connection=lahman}
SELECT playerID, yearID, teamID, RBI, CS FROM batting
WHERE playerID=='stantmi03' OR playerID=='troutmi01'
ORDER BY CS DESC LIMIT 1;
```
</details>
#### Make a list of players who walked (`BB`) more than they struck out (`SO`) between 1980 and 1985. Of these players, who walked the most? Use the `BETWEEN` command in this queries. Use a second query to get the player's first and last name.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID, yearID, teamID, BB, SO FROM batting
WHERE BB > SO LIMIT 10;
```
```{sql, connection=lahman}
SELECT nameFirst, nameLast FROM people WHERE playerID=='randowi01';
```
</details>
#### How many different NL catchers (C) won gold glove winners between 1990 and 2000?
<details>
<summary>Click here for solution</summary>
There were 6 different catchers.
```{sql, connection=lahman}
SELECT DISTINCT playerID FROM awardsplayers
WHERE awardID=='Gold Glove' AND notes=='C'
AND lgID=='NL' AND yearID BETWEEN 1990 AND 2000;
```
</details>
#### How many different 3rd Basemen played for the Seattle Mariners between 2000 and 2005? Who had the most Errors?
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT DISTINCT playerID, yearID, E FROM fielding WHERE
yearID BETWEEN 2000 AND 2005 AND teamID=='SEA'
AND POS=='3B'
ORDER BY E DESC LIMIT 10;
```
```{sql, connection=lahman}
SELECT nameFirst, nameLast FROM people
WHERE playerID=='camermi01';
```
</details>
#### Craig Biggio was more known for his play at second base over his major league baseball career, but he didn't always play second base. What seasons did Craig Biggio play Catcher?
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Craig' AND nameLast=='Biggio';
```
```{sql, connection=lahman}
SELECT teamID, yearID, POS FROM fielding
WHERE playerID=='biggicr01' AND POS=='C';
```
</details>
#### Find the teams that have won the World Series that represented the National League. Display the list with the `yearID` and `teamID` in ascending order.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT teamID, yearID FROM teams
WHERE WSWin=='Y' AND lgID=='NL'
ORDER BY yearID ASC LIMIT 10;
```
</details>
#### List the pitchers that threw at least one complete game (CG) in the 1995 season. Please include the wins and losses of the top 10 pitchers. Use the playerID of the pitcher who threw the most complete games to find out the name of the pitcher that had the most complete games.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID, W, L, CG FROM pitching
WHERE CG > 0 AND yearID==1995
ORDER BY CG DESC
LIMIT 10;
```
```{sql, connection=lahman}
SELECT nameFirst, nameLast FROM people
WHERE playerID=='maddugr01';
```
</details>
#### Who was the most recent player manager?
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID, yearID FROM managers
WHERE plyrMgr=='Y'
ORDER BY yearID DESC LIMIT 10;
```
```{sql, connection=lahman}
SELECT nameFirst, nameLast FROM people WHERE playerID=='rosepe01';
```
</details>
#### Get the at-bats, homeruns, stolen bases for Roberto Clemente by year in ascending order.
<details>
<summary>Click here for solution</summary>
Roberto Clemente is known as being a leader for the Pittsburgh Pirates. He died in a 1972 plane crash on a humanitarian mission to Puerto Rico, where he grew up.
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Roberto' AND nameLast=='Clemente';
```
```{sql, connection=lahman}
SELECT yearID,AB,HR,SB FROM battingpost
WHERE playerID=='clemero01'
ORDER BY yearID ASC;
```
</details>
#### Get a list of distinct World Series winners from the years Tom Lasorda managed the Los Angeles Dodgers (LAN). First find the years Tom Lasorda was the manager of the Los Angeles Dodgers, and then find the distinct teams that won a World Series in that time frame.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Tom' AND nameLast=='Lasorda';
```
```{sql, connection=lahman}
SELECT yearID FROM managers
WHERE playerID=='lasorto01' LIMIT 10;
```
```{sql, connection=lahman}
SELECT DISTINCT teamID FROM teams
WHERE WSWin=='Y' AND yearID BETWEEN 1976 AND 1996;
```
</details>
#### Which teams did Kenny Lofton steal more than 20 bases in a season after the year 2000?
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Kenny' AND nameLast=='Lofton';
```
```{sql, connection=lahman}
SELECT teamID, yearID, SB FROM batting
WHERE playerID=='loftoke01' AND SB > 20
AND yearID >2000;
```
</details>
#### How much did the Tampa Bay Rays (TBL) pay Wade Boggs in 1998? Who paid Boggs the most in a season during his career?
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID FROM people
WHERE nameFirst=='Wade' AND nameLast=='Boggs';
```
```{sql, connection=lahman}
SELECT teamID, yearID, salary FROM salaries
WHERE playerID=='boggswa01'
AND yearID==1998;
```
```{sql, connection=lahman}
SELECT teamID, yearID, salary FROM salaries
WHERE playerID=='boggswa01'
ORDER BY salary DESC LIMIT 10;
```
</details>
####
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT teamID, yearID, W, L, HR, HRA, attendance FROM teams
WHERE teamID=='DET' AND (WSWin=='Y' OR LgWin=='Y');
```
</details>
#### The standings you would find in a newspaper often have Wins and Losses in order of most to least wins. There are often other numbers that are involved like winning percentage, and other team statistics, but we won't deal with that for now. Get the NL East Standings in 2015.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT teamID, W, L FROM teams
WHERE divID=='E' AND lgID=='NL'
AND yearID==2015
ORDER BY teamrank ASC;
```
</details>
#### Make a list of the teams, wins, losses, years for NL East teams that have won the World Series. Which team had the most wins?
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT teamID, yearID, W, L FROM teams
WHERE lgID=='NL' AND divID=='E' AND WSWin=='Y'
ORDER BY W DESC;
```
</details>
#### Get a list of the `playerIDs` of managers who won more games than they lost between 1930 and 1950. Get the manager's name, and the name of the team of the manager with the most wins on the list.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID, teamID, yearID, W, L FROM managers
WHERE yearID BETWEEN 1930 AND 1950 AND W > L
ORDER BY W DESC LIMIT 10;
```
```{sql, connection=lahman}
SELECT nameFirst, nameLast FROM people
WHERE playerID=='mackco01';
```
```{sql, connection=lahman}
SELECT franchName FROM teamsfranchises
WHERE franchID=='PHA';
```
</details>
#### Get the top 5 seasons from Florida Teams (Florida Marlins, Tampa Bay Rays, and Miami Marlins) in attendance. How many have occured since 2000?
<details>
<summary>Click here for solution</summary>
Florida baseball teams are not known for their attendance for a variety of reasons. Both MLB franchises play in domed fields, but usually do not draw large crowds.
```{sql, connection=lahman}
SELECT franchID, franchName FROM teamsfranchises
WHERE franchName=='Tampa Bay Rays'
OR franchName=='Florida Marlins';
```
```{sql, connection=lahman}
SELECT teamID, yearID, attendance FROM teams
WHERE franchID=='TBD' OR franchID=='FLA'
ORDER BY attendance DESC LIMIT 10;
```
</details>
#### What pitcher has thrown the most Shutouts (SHO) in the AL since 2010? What about the NL? Please get their first and last names respectively.
<details>
<summary>Click here for solution</summary>
```{sql, connection=lahman}
SELECT playerID,teamID, yearID, SHO FROM pitching
WHERE yearID>2010 AND lgID=='NL'
ORDER BY SHO DESC LIMIT 10;
```
```{sql, connection=lahman}
SELECT playerID,teamID, yearID, SHO FROM pitching
WHERE yearID>2010 AND lgID=='AL'
ORDER BY SHO DESC LIMIT 10;
```
```{sql, connection=lahman}
SELECT nameFirst, nameLast FROM people
WHERE playerID=='leecl02' OR playerID=='hernafe02';
```
</details>
The following examples use the `chinook.db` sqlite database.
```{r}
dbListTables(chinook)
```
#### How do I select all of the rows of a table called employees?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT * FROM employees;
```
</details>
#### How do I select the first 5 rows of a table called employees?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT * FROM employees LIMIT 5;
```
</details>
#### How do I select specific rows of a table called employees?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT LastName, FirstName FROM employees;
```
You can switch the order in which the columns are displayed as well:
```{sql, connection=chinook}
SELECT FirstName, LastName FROM employees;
```
</details>
#### How do I select only unique values from a column?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT DISTINCT Title FROM employees;
```
</details>
#### How can I filter that match a certain criteria?
<details>
<summary>Click here for solution</summary>
Select only employees with a FirstName "Steve":
```{sql, connection=chinook}
SELECT * FROM employees WHERE FirstName='Steve';
```
Select only employees with FirstName "Steve" OR FirstName "Laura":
```{sql, connection=chinook}
SELECT * FROM employees WHERE FirstName='Steve' OR FirstName='Laura';
```
Select only employees with FirstName "Steve" AND LastName "Laura":
```{sql, connection=chinook}
SELECT * FROM employees WHERE FirstName='Steve' AND LastName='Laura';
```
As expected, there are no results! There is nobody with the full name "Steve Laura".
</details>
#### List the first 10 tracks from the `tracks` table.
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT * FROM tracks LIMIT 10;
```
</details>
#### How many rows or records are in the table named `tracks`?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT COUNT(*) FROM tracks;
```
</details>
#### Are there any artists with the names: "Elis Regina", "Seu Jorge", or "The Beatles"?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT * FROM artists WHERE Name='Elis Regina' OR Name='Seu Jorge' OR Name='The Beatles';
```
</details>
#### What albums did the artist with `ArtistId` of 41 make?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT * FROM albums WHERE ArtistId=41;
```
</details>
#### What are the tracks of the album with `AlbumId` of 71? Order the results from most `Milliseconds` to least.
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT * FROM tracks WHERE AlbumId=71 ORDER BY Milliseconds DESC;
```
</details>
#### What are the tracks of the album with `AlbumId` of 71? Order the results from longest to shortest and convert `Milliseconds` to seconds. Use aliasing to name the calculated field `Seconds`. {#sql-aliasing}
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT Milliseconds/1000.0 AS Seconds, * FROM tracks WHERE AlbumId=71 ORDER BY Seconds DESC;
```
</details>
#### What are the tracks that are at least 250 seconds long?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT Milliseconds/1000.0 AS Seconds, * FROM tracks WHERE Seconds >= 250;
```
</details>
#### What are the tracks that are between 250 and 300 seconds long?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT Milliseconds/1000.0 AS Seconds, * FROM tracks WHERE Seconds BETWEEN 250 AND 300 ORDER BY Seconds;
```
</details>
#### What is the `GenreId` of the genre with name `Pop`?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT GenreId FROM genres WHERE Name='Pop';
```
</details>
#### What is the average length (in seconds) of a track with genre "Pop"? {#sql-avg}
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT AVG(Milliseconds/1000.0) AS avg FROM tracks WHERE genreId=9;
```
</details>
#### What is the longest Bossa Nova track (in seconds)? {#sql-max}
<details>
<summary>Click here for solution</summary>
What is the `GenreId` of Bossa Nova?
```{sql, connection=chinook}
SELECT GenreId FROM genres WHERE Name='Bossa Nova';
```
```{sql, connection=chinook}
SELECT *, MAX(Milliseconds/1000.0) AS Seconds FROM tracks WHERE genreId=11;
```
</details>
#### Get the average price per hour for Bossa Nova music (`genreId` of 11).
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT AVG(UnitPrice/Milliseconds/1000.0/3600) AS 'Price per Hour' FROM tracks WHERE genreId=11;
```
</details>
#### Get the average time (in seconds) for tracks by genre. {#sql-groupby}
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT genreId, AVG(Milliseconds/1000.0) AS 'Average seconds per track' FROM tracks GROUP BY genreId;
```
We can use an INNER JOIN to get the name of each genre as well. {#sql-inner-join}
```{sql, connection=chinook}
SELECT g.Name, track_time.'Average seconds per track' FROM genres AS g INNER JOIN (SELECT genreId, AVG(Milliseconds/1000.0) AS 'Average seconds per track' FROM tracks GROUP BY genreId) AS track_time ON g.GenreId=track_time.GenreId ORDER BY track_time.'Average seconds per track' DESC;
```
</details>
#### What is the average price per track for each genre?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT genreId, AVG(UnitPrice) AS 'Average seconds per track' FROM tracks GROUP BY genreId;
```
</details>
#### What is the average number of tracks per album? {#sql-count}
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT AVG(trackCount) FROM (SELECT COUNT(*) AS trackCount FROM tracks GROUP BY albumId) AS track_count;
```
</details>
#### What is the average number of tracks per album per genre?
<details>
<summary>Click here for solution</summary>
```{sql, connection=chinook}
SELECT genreId, AVG(trackCount) FROM (SELECT genreId, COUNT(*) AS trackCount FROM tracks GROUP BY albumId) AS track_count GROUP BY genreId;
```
```{sql, connection=chinook}
SELECT Name, avg_track_count.'Average Track Count' FROM genres AS g INNER JOIN (SELECT genreId, AVG(trackCount) AS 'Average Track Count' FROM (SELECT genreId, COUNT(*) AS trackCount FROM tracks GROUP BY albumId) AS track_count GROUP BY genreId) AS avg_track_count ON g.GenreId=avg_track_count.genreId;
```
</details>
The following examples us the `lahman.db` sqlite database.
```{r}
dbListTables(lahman)
```