class: logo-slide --- class: title-slide ## The Tidyverse ### Applications of Data Science - Class 1 ### Giora Simchoni #### `gsimchoni@gmail.com and add #dsapps in subject` ### Stat. and OR Department, TAU ### 2019-12-12 --- layout: true <div class="my-footer"> <span> <a href="https://dsapps-2020.github.io/Class_Slides/" target="_blank">Applications of Data Science </a> </span> </div> --- class: section-slide # I don't need to know about wrangling data, I get by. --- # So, what's wrong with Excel? (MS Excel is one amazing software. But it lacks:) - Structure (or rather, structure is up to the user) - Types to variables - Automization (you could learn VBA Excel, but the horror) - Reproducibility - Open Source - Extensibility - Speed and Scale - Modeling (there *is* a t-test, but the horror) [MS Excel might be the most dangerous software on the planet](https://www.forbes.com/sites/timworstall/2013/02/13/microsofts-excel-might-be-the-most-dangerous-software-on-the-planet/#667084f5633d) (Tim Worstall, Forbes) --- # So, what's wrong with base R? (Base R is one amazing software. But it lacks:) - Consistency: - Function names - Function arguments names - Function arguments order - Function return types (sometimes the same function!) - Meaningful errors and warnings - Good choices of default values to arguments - Speed - Good and easy visualizations - One other thing --- ### (In) Consistency - Example 1: Strings .font80percent[ ```r # split a string by pattern: strsplit(string, pattern) strsplit("Who dis?", " ") ``` ``` ## [[1]] ## [1] "Who" "dis?" ``` ```r # find if a pattern exists in a string: grepl(pattern, string) grepl("di", "Who dis?") ``` ``` ## [1] TRUE ``` ```r # substitute a pattern in a string: sub(pattern, replace, string) sub("di", "thi", "Who dis?") ``` ``` ## [1] "Who this?" ``` ```r # length of a string: nchar(string); length of object: length(obj) c(nchar("Who dis?"), length("Who dis?")) ``` ``` ## [1] 8 1 ``` ] --- ### (In) Consistency - Example 2: Models ```r n <- 10000 x1 <- runif(n) x2 <- runif(n) t <- 1 + 2 * x1 + 3 * x2 y <- rbinom(n, 1, 1 / (1 + exp(-t))) ``` ```r glm(y ~ x1 + x2, family = "binomial") ``` ```r glmnet(as.matrix(cbind(x1, x2)), as.factor(y), family = "binomial") ``` ```r randomForest(as.factor(y) ~ x1 + x2) ``` ```r gbm(y ~ x1 + x2, data = data.frame(x1 = x1, x2 = x2, y = y)) ``` 😱 --- ### (Un) Meaningful Errors - Example ```r df <- data.frame(Education = 1:5, Ethnicity = c(2, 4, 5, 2, 1)) table(df$Eduction, df$Ethnicity) ``` <pre style="color: red;"><code>## Error in table(df$Eduction, df$Ethnicity): all arguments must have the same length </code></pre> --- ### (Bad) Default Values - Example ![](images/bad_args_table.png) ```r df <- read.csv("../data/bad_args_test.csv") df$col3 ``` ``` ## [1] a b c d ## Levels: a b c d ``` ```r df <- read.csv("../data/bad_args_test.csv", stringsAsFactors = FALSE) df$col3 ``` ``` ## [1] "a" "b" "c" "d" ``` --- ### (No) Speed - Example ```r file_path <- "../data/mediocre_file.csv" df <- read.csv(file_path) dim(df) ``` ``` ## [1] 9180 14 ``` ```r library(microbenchmark) microbenchmark( read_base = read.csv(file_path), read_tidy = read_csv(file_path, col_types = cols()), read_dt = data.table::fread(file_path), times = 10) ``` ``` ## Unit: milliseconds ## expr min lq mean median uq max ## read_base 263.938201 265.3144 267.12326 267.015151 268.4959 271.9319 ## read_tidy 18.726101 19.2148 24.95933 19.589951 22.2064 63.7732 ## read_dt 8.751801 8.8653 14.22467 9.493901 10.9003 55.6528 ## neval ## 10 ## 10 ## 10 ``` --- class: section-slide # Detour: The OKCupid Dataset --- ## The OKCupid Dataset - ~60K active OKCupid users scraped on June 2012 - 35K Male, 25K Female (less awareness for non-binary back then) - Answers to questions like: - Body Type - Diet - Substance Abuse - Education - Do you like pets? - Open questions, e.g. "On a typical Friday night I am..." - And the more boring demographic details like age, height, location, sign, religion etc. - See [here](https://github.com/rudeboybert/JSE_OkCupid/blob/master/okcupid_codebook.txt) for the full codebook --- class: section-slide # End of Detour --- ### (Not) Good Vizualizations - Example .font80percent[ ```r okcupid <- read_csv("../data/okcupid.csv.zip", col_types = cols()) okcupid$income[okcupid$income == -1] <- NA okcupid$height_cm <- okcupid$height * 2.54 ``` ```r plot(okcupid$height_cm, log10(okcupid$income + 1), col = c("red", "green")[as.factor(okcupid$sex)]) ``` <img src="images/Viz-Base-1.png" width="40%" /> ] --- ```r ggplot(okcupid, aes(height_cm, log10(income + 1), color = sex)) + geom_point() ``` <pre style="color: red;"><code>## Warning: Removed 48442 rows containing missing values (geom_point). </code></pre><img src="images/Viz-Tidy-1.png" width="50%" /> --- ## One other thing Manager: "Give me the average income of women respondents above age 30 grouped by sexual orientation!" You: ```r mean_bi <- mean(okcupid$income[okcupid$sex == "f" & okcupid$age > 30 & okcupid$orientation == "bisexual"], na.rm = TRUE) mean_gay <- mean(okcupid$income[okcupid$sex == "f" & okcupid$age > 30 & okcupid$orientation == "gay"], na.rm = TRUE) mean_straight <- mean(okcupid$income[okcupid$sex == "f" & okcupid$age > 30 & okcupid$orientation == "straight"], na.rm = TRUE) data.frame(orientation = c("bisexual", "gay", "straight"), income_mean = c(mean_bi, mean_gay, mean_straight)) ``` ``` ## orientation income_mean ## 1 bisexual 133421.05 ## 2 gay 86489.36 ## 3 straight 85219.74 ``` --- Or the slightly better you: ```r mean_income_function <- function(orientation) { mean(okcupid$income[okcupid$sex == "f" & okcupid$age > 30 & okcupid$orientation == orientation], na.rm = TRUE) } mean_bi <- mean_income_function("bisexual") mean_gay <- mean_income_function("gay") mean_straight <- mean_income_function("straight") data.frame(orientation = c("bisexual", "gay", "straight"), income_mean = c(mean_bi, mean_gay, mean_straight)) ``` ``` ## orientation income_mean ## 1 bisexual 133421.05 ## 2 gay 86489.36 ## 3 straight 85219.74 ``` --- Or the even better you: ```r orientations <- c("bisexual", "gay", "straight") income_means <- numeric(3) for (i in seq_along(orientations)) { income_means[i] <- mean_income_function(orientations[i]) } data.frame(orientation = orientations, income_mean = income_means) ``` ``` ## orientation income_mean ## 1 bisexual 133421.05 ## 2 gay 86489.36 ## 3 straight 85219.74 ``` --- Or the best you: ```r okcupid_females_over30 <- with(okcupid, okcupid[sex == "f" & age > 30, ]) aggregate(okcupid_females_over30$income, by = list(orientation = okcupid_females_over30$orientation), FUN = mean, na.rm = TRUE) ``` ``` ## orientation x ## 1 bisexual 133421.05 ## 2 gay 86489.36 ## 3 straight 85219.74 ``` <br> Manager: "What? Why would bisexual women have a higher income than straight or gay women? Could you add the median, trimmed mean, standard error and n?" You: 😱 --- class: section-slide # The Tidyverse --- ## What *is* The Tidyverse? > The [tidyverse](https://www.tidyverse.org/) is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. - `tibble`: the `data.frame` re-imagined - `readr`: importing/exporting (mostly rectangular) data for humans - `dplyr` + `tidyr`: a grammar of data manipulation - `purrr`: functional programming in R - `stringr`: string manipulation - `ggplot2`: a grammar of graphics --- The above can all be installed and loaded under the `tidyverse` package: ```r library(tidyverse) ``` Many more: - `lubridate`: manipulating dates - `modelr`, `recipes`, `rsample`, `infer`: tidy modeling/statistics - `rvest`: web scraping - `tidytext`: tidy text analysis (life saver) - `tidygraph` + `ggraph`: manipulating and plotting networks - `glue`: print like a boss - countless `gg` extensions (`ggmosaic`, `ggbeeswarm`, `gganimate`, `ggridges` etc.) --- # What's so great about the Tidyverse? - Tidy Data - Consistentcy (in function names, args, return types, documentation) - The Pipe - Speed (C++ under the hood) - `ggplot2` - The Community --- ## Tidy Data - Each variable must have its own column. - Each observation must have its own row. - Each value must have its own cell. <br> <img src="images/tidy_data.png" style="width: 90%" /> --- ### Which one of these datasets is tidy? (I) ```r table1 ``` ``` ## # A tibble: 315 x 4 ## religion yob n_straight n_total ## <chr> <dbl> <dbl> <dbl> ## 1 atheist 1950 26 29 ## 2 buddhist 1950 6 6 ## 3 christian 1950 28 32 ## 4 hindu 1950 0 0 ## 5 jewish 1950 21 24 ## 6 muslim 1950 0 0 ## 7 unspecified 1950 71 76 ## 8 atheist 1951 31 33 ## 9 buddhist 1951 11 11 ## 10 christian 1951 23 24 ## # ... with 305 more rows ``` --- ### Which one of these datasets is tidy? (II) ```r table2 ``` ``` ## # A tibble: 630 x 4 ## religion yob type n ## <chr> <dbl> <chr> <dbl> ## 1 atheist 1950 straight 26 ## 2 atheist 1950 total 29 ## 3 buddhist 1950 straight 6 ## 4 buddhist 1950 total 6 ## 5 christian 1950 straight 28 ## 6 christian 1950 total 32 ## 7 hindu 1950 straight 0 ## 8 hindu 1950 total 0 ## 9 jewish 1950 straight 21 ## 10 jewish 1950 total 24 ## # ... with 620 more rows ``` --- ### Which one of these datasets is tidy? (III) ```r table3 ``` ``` ## # A tibble: 315 x 3 ## religion yob pct_straight ## <chr> <dbl> <chr> ## 1 atheist 1950 26/29 ## 2 buddhist 1950 6/6 ## 3 christian 1950 28/32 ## 4 hindu 1950 0/0 ## 5 jewish 1950 21/24 ## 6 muslim 1950 0/0 ## 7 unspecified 1950 71/76 ## 8 atheist 1951 31/33 ## 9 buddhist 1951 11/11 ## 10 christian 1951 23/24 ## # ... with 305 more rows ``` --- ### Which one of these datasets is tidy? (IV) ```r table4 ``` ``` ## # A tibble: 7 x 91 ## religion n_total_1950 n_total_1951 n_total_1952 n_total_1953 n_total_1954 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 atheist 29 33 34 37 40 ## 2 buddhist 6 11 14 16 11 ## 3 christi~ 32 24 37 47 37 ## 4 hindu 0 0 0 1 1 ## 5 jewish 24 29 27 23 25 ## 6 muslim 0 0 0 0 0 ## 7 unspeci~ 76 79 83 97 83 ## # ... with 85 more variables: n_total_1955 <dbl>, n_total_1956 <dbl>, ## # n_total_1957 <dbl>, n_total_1958 <dbl>, n_total_1959 <dbl>, ## # n_total_1960 <dbl>, n_total_1961 <dbl>, n_total_1962 <dbl>, ## # n_total_1963 <dbl>, n_total_1964 <dbl>, n_total_1965 <dbl>, ## # n_total_1966 <dbl>, n_total_1967 <dbl>, n_total_1968 <dbl>, ## # n_total_1969 <dbl>, n_total_1970 <dbl>, n_total_1971 <dbl>, ## # n_total_1972 <dbl>, n_total_1973 <dbl>, n_total_1974 <dbl>, ## # n_total_1975 <dbl>, n_total_1976 <dbl>, n_total_1977 <dbl>, ## # n_total_1978 <dbl>, n_total_1979 <dbl>, n_total_1980 <dbl>, ## # n_total_1981 <dbl>, n_total_1982 <dbl>, n_total_1983 <dbl>, ## # n_total_1984 <dbl>, n_total_1985 <dbl>, n_total_1986 <dbl>, ## # n_total_1987 <dbl>, n_total_1988 <dbl>, n_total_1989 <dbl>, ## # n_total_1990 <dbl>, n_total_1991 <dbl>, n_total_1992 <dbl>, ## # n_total_1993 <dbl>, n_total_1994 <dbl>, n_straight_1950 <dbl>, ## # n_straight_1951 <dbl>, n_straight_1952 <dbl>, n_straight_1953 <dbl>, ## # n_straight_1954 <dbl>, n_straight_1955 <dbl>, n_straight_1956 <dbl>, ## # n_straight_1957 <dbl>, n_straight_1958 <dbl>, n_straight_1959 <dbl>, ## # n_straight_1960 <dbl>, n_straight_1961 <dbl>, n_straight_1962 <dbl>, ## # n_straight_1963 <dbl>, n_straight_1964 <dbl>, n_straight_1965 <dbl>, ## # n_straight_1966 <dbl>, n_straight_1967 <dbl>, n_straight_1968 <dbl>, ## # n_straight_1969 <dbl>, n_straight_1970 <dbl>, n_straight_1971 <dbl>, ## # n_straight_1972 <dbl>, n_straight_1973 <dbl>, n_straight_1974 <dbl>, ## # n_straight_1975 <dbl>, n_straight_1976 <dbl>, n_straight_1977 <dbl>, ## # n_straight_1978 <dbl>, n_straight_1979 <dbl>, n_straight_1980 <dbl>, ## # n_straight_1981 <dbl>, n_straight_1982 <dbl>, n_straight_1983 <dbl>, ## # n_straight_1984 <dbl>, n_straight_1985 <dbl>, n_straight_1986 <dbl>, ## # n_straight_1987 <dbl>, n_straight_1988 <dbl>, n_straight_1989 <dbl>, ## # n_straight_1990 <dbl>, n_straight_1991 <dbl>, n_straight_1992 <dbl>, ## # n_straight_1993 <dbl>, n_straight_1994 <dbl> ``` --- ### Why Tidy? > Happy families are all alike; every unhappy family is unhappy in its own way. (Leo Tolstoy) <br> > It allows R’s vectorised nature to shine. (Hadley Wickham) --- ### A Tidy dataset will be much easier to transform ```r table1$pct_straight = table1$n_straight / table1$n_total table1 ``` ``` ## # A tibble: 315 x 5 ## religion yob n_straight n_total pct_straight ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 atheist 1950 26 29 0.897 ## 2 buddhist 1950 6 6 1 ## 3 christian 1950 28 32 0.875 ## 4 hindu 1950 0 0 NaN ## 5 jewish 1950 21 24 0.875 ## 6 muslim 1950 0 0 NaN ## 7 unspecified 1950 71 76 0.934 ## 8 atheist 1951 31 33 0.939 ## 9 buddhist 1951 11 11 1 ## 10 christian 1951 23 24 0.958 ## # ... with 305 more rows ``` --- ### A Tidy dataset will be much easier to plot ```r ggplot(table1, aes(x = yob, y = pct_straight, color = religion)) + geom_smooth(method = "loess", se = FALSE) ``` <img src="images/Table1-Plot-1.png" width="50%" /> --- class: section-slide # Detour: The `tibble` --- ### The `tibble`: the `data.frame` re-imagined - Prints nicer: ```r tib1 <- tibble(day = lubridate::today() + runif(1e3) * 30, type = sample(letters, 1e3, replace = TRUE), quantity = sample(seq(0, 100, 10), 1e3, replace = TRUE)) tib1 ``` ``` ## # A tibble: 1,000 x 3 ## day type quantity ## <date> <chr> <dbl> ## 1 2020-01-02 p 40 ## 2 2019-12-29 s 90 ## 3 2019-12-25 o 100 ## 4 2019-12-13 r 40 ## 5 2019-12-29 k 10 ## 6 2020-01-09 w 100 ## 7 2020-01-07 r 70 ## 8 2020-01-02 b 100 ## 9 2020-01-05 s 50 ## 10 2020-01-07 o 10 ## # ... with 990 more rows ``` --- ```r df1 <- data.frame(day = lubridate::today() + runif(1e3) * 30, type = sample(letters, 1e3, replace = TRUE), quantity = sample(seq(0, 100, 10), 1e3, replace = TRUE)) df1 ``` ``` ## day type quantity ## 1 2019-12-13 l 10 ## 2 2019-12-24 p 40 ## 3 2020-01-05 y 50 ## 4 2019-12-18 z 10 ## 5 2020-01-03 v 90 ## 6 2019-12-26 h 50 ## 7 2019-12-27 z 80 ## 8 2019-12-13 k 90 ## 9 2019-12-31 d 40 ## 10 2020-01-08 w 30 ## 11 2020-01-08 v 100 ## 12 2019-12-13 r 90 ## 13 2019-12-14 a 70 ## 14 2019-12-24 m 80 ## 15 2019-12-30 k 70 ## 16 2019-12-15 i 20 ## 17 2019-12-22 u 90 ## 18 2019-12-17 f 40 ## 19 2019-12-17 x 100 ## 20 2019-12-29 t 10 ## 21 2020-01-10 h 70 ## 22 2019-12-24 t 10 ## 23 2020-01-01 n 0 ## 24 2020-01-05 f 90 ## 25 2019-12-19 w 20 ## 26 2019-12-21 n 50 ## 27 2020-01-09 s 80 ## 28 2019-12-21 g 80 ## 29 2019-12-31 c 10 ## 30 2020-01-08 t 100 ## 31 2019-12-30 p 0 ## 32 2019-12-24 d 50 ## 33 2019-12-21 x 30 ## 34 2019-12-25 s 20 ## 35 2020-01-08 o 60 ## 36 2020-01-07 m 100 ## 37 2019-12-26 r 80 ## 38 2019-12-24 i 20 ## 39 2019-12-18 e 20 ## 40 2020-01-06 y 60 ## 41 2019-12-30 v 80 ## 42 2019-12-23 k 50 ## 43 2019-12-29 k 80 ## 44 2019-12-29 t 100 ## 45 2020-01-06 s 100 ## 46 2019-12-22 j 20 ## 47 2020-01-01 w 10 ## 48 2019-12-15 c 20 ## 49 2019-12-18 d 50 ## 50 2019-12-25 q 80 ## 51 2019-12-28 y 50 ## 52 2019-12-21 x 30 ## 53 2019-12-29 w 80 ## 54 2019-12-25 n 30 ## 55 2019-12-22 m 40 ## 56 2020-01-05 k 10 ## 57 2019-12-14 z 0 ## 58 2020-01-09 h 50 ## 59 2019-12-30 f 0 ## 60 2020-01-10 c 10 ## 61 2019-12-31 u 40 ## 62 2019-12-21 i 40 ## 63 2019-12-21 u 50 ## 64 2019-12-15 q 50 ## 65 2020-01-05 q 10 ## 66 2020-01-02 i 0 ## 67 2019-12-26 r 90 ## 68 2020-01-09 b 0 ## 69 2019-12-28 q 90 ## 70 2019-12-18 h 30 ## 71 2019-12-23 q 60 ## 72 2020-01-06 d 60 ## 73 2019-12-17 d 20 ## 74 2020-01-09 q 40 ## 75 2019-12-29 c 20 ## 76 2020-01-04 t 20 ## 77 2020-01-08 v 20 ## 78 2019-12-15 o 30 ## 79 2019-12-31 u 80 ## 80 2019-12-23 i 20 ## 81 2019-12-15 m 30 ## 82 2020-01-01 n 20 ## 83 2019-12-25 b 10 ## 84 2020-01-09 j 90 ## 85 2019-12-25 h 90 ## 86 2019-12-14 d 80 ## 87 2019-12-25 w 10 ## 88 2019-12-21 p 90 ## 89 2019-12-15 c 70 ## 90 2019-12-12 a 60 ## 91 2019-12-31 u 60 ## 92 2019-12-15 b 0 ## 93 2019-12-21 k 0 ## 94 2019-12-16 s 100 ## 95 2020-01-02 v 40 ## 96 2019-12-23 a 90 ## 97 2020-01-09 x 60 ## 98 2019-12-27 n 0 ## 99 2019-12-31 d 100 ## 100 2019-12-23 u 80 ## 101 2019-12-25 t 0 ## 102 2019-12-22 g 70 ## 103 2019-12-12 x 100 ## 104 2020-01-06 n 100 ## 105 2019-12-20 y 10 ## 106 2019-12-25 f 100 ## 107 2019-12-24 n 100 ## 108 2020-01-01 u 70 ## 109 2020-01-01 z 30 ## 110 2019-12-15 y 60 ## 111 2020-01-01 c 60 ## 112 2020-01-01 y 60 ## 113 2020-01-09 i 30 ## 114 2019-12-18 p 40 ## 115 2020-01-05 h 90 ## 116 2019-12-26 r 20 ## 117 2019-12-12 q 40 ## 118 2019-12-30 e 100 ## 119 2019-12-22 e 30 ## 120 2020-01-10 h 50 ## 121 2019-12-24 z 60 ## 122 2019-12-28 a 70 ## 123 2020-01-02 m 80 ## 124 2019-12-28 k 80 ## 125 2019-12-24 s 80 ## 126 2019-12-25 a 100 ## 127 2020-01-02 c 20 ## 128 2020-01-06 z 70 ## 129 2020-01-06 g 10 ## 130 2020-01-03 l 100 ## 131 2019-12-26 w 20 ## 132 2019-12-24 t 20 ## 133 2019-12-21 p 60 ## 134 2020-01-10 p 30 ## 135 2019-12-12 m 40 ## 136 2019-12-16 q 90 ## 137 2019-12-20 o 20 ## 138 2019-12-21 r 80 ## 139 2019-12-12 v 70 ## 140 2019-12-18 r 10 ## 141 2019-12-24 f 40 ## 142 2019-12-18 a 0 ## 143 2019-12-13 w 80 ## 144 2019-12-21 l 50 ## 145 2019-12-27 l 10 ## 146 2019-12-24 y 80 ## 147 2019-12-13 i 70 ## 148 2019-12-30 a 30 ## 149 2019-12-12 l 20 ## 150 2019-12-28 t 50 ## 151 2019-12-12 j 40 ## 152 2019-12-17 t 50 ## 153 2019-12-22 t 100 ## 154 2019-12-29 b 60 ## 155 2020-01-07 q 0 ## 156 2019-12-28 z 80 ## 157 2020-01-10 h 70 ## 158 2020-01-10 u 10 ## 159 2019-12-18 n 50 ## 160 2019-12-26 s 70 ## 161 2019-12-22 g 90 ## 162 2020-01-03 f 60 ## 163 2019-12-28 m 10 ## 164 2020-01-01 w 20 ## 165 2019-12-14 a 100 ## 166 2019-12-28 v 100 ## 167 2020-01-03 j 60 ## 168 2020-01-03 o 70 ## 169 2019-12-30 y 70 ## 170 2019-12-28 r 40 ## 171 2019-12-26 u 70 ## 172 2019-12-19 k 40 ## 173 2019-12-19 b 10 ## 174 2020-01-07 m 40 ## 175 2019-12-16 e 30 ## 176 2019-12-15 d 40 ## 177 2019-12-19 a 30 ## 178 2020-01-03 p 90 ## 179 2019-12-16 r 70 ## 180 2019-12-24 z 0 ## 181 2020-01-10 l 20 ## 182 2019-12-15 a 80 ## 183 2019-12-17 u 30 ## 184 2020-01-07 b 90 ## 185 2019-12-25 y 90 ## 186 2020-01-07 v 50 ## 187 2019-12-21 n 30 ## 188 2019-12-29 n 10 ## 189 2019-12-27 p 80 ## 190 2020-01-07 e 40 ## 191 2019-12-16 u 10 ## 192 2020-01-04 d 100 ## 193 2019-12-21 a 90 ## 194 2020-01-10 a 10 ## 195 2020-01-04 p 20 ## 196 2019-12-12 k 70 ## 197 2019-12-19 v 40 ## 198 2019-12-16 g 100 ## 199 2019-12-25 j 70 ## 200 2019-12-26 u 50 ## 201 2019-12-14 v 30 ## 202 2020-01-03 i 10 ## 203 2019-12-12 u 30 ## 204 2019-12-18 k 70 ## 205 2019-12-14 e 0 ## 206 2019-12-30 x 10 ## 207 2019-12-25 v 90 ## 208 2020-01-08 r 20 ## 209 2020-01-04 o 70 ## 210 2020-01-05 f 20 ## 211 2019-12-18 s 10 ## 212 2020-01-02 e 60 ## 213 2019-12-26 f 10 ## 214 2019-12-14 i 50 ## 215 2020-01-09 a 70 ## 216 2019-12-16 x 40 ## 217 2019-12-23 t 80 ## 218 2019-12-21 j 10 ## 219 2020-01-03 e 50 ## 220 2020-01-09 m 30 ## 221 2020-01-03 v 50 ## 222 2019-12-14 x 80 ## 223 2019-12-25 l 0 ## 224 2019-12-28 i 10 ## 225 2020-01-09 z 0 ## 226 2020-01-05 g 100 ## 227 2020-01-04 n 30 ## 228 2019-12-26 j 100 ## 229 2020-01-04 u 80 ## 230 2019-12-13 o 60 ## 231 2020-01-07 y 80 ## 232 2019-12-16 n 50 ## 233 2019-12-31 t 50 ## 234 2020-01-01 r 100 ## 235 2020-01-02 x 70 ## 236 2019-12-27 j 60 ## 237 2019-12-25 e 40 ## 238 2019-12-15 z 90 ## 239 2020-01-09 k 20 ## 240 2019-12-29 r 50 ## 241 2019-12-25 y 10 ## 242 2019-12-27 p 80 ## 243 2020-01-06 i 80 ## 244 2019-12-19 b 40 ## 245 2019-12-20 z 30 ## 246 2019-12-15 z 70 ## 247 2020-01-05 g 50 ## 248 2019-12-27 n 50 ## 249 2020-01-09 e 10 ## 250 2019-12-16 h 30 ## 251 2019-12-16 z 100 ## 252 2019-12-24 t 10 ## 253 2019-12-26 v 80 ## 254 2020-01-01 c 90 ## 255 2020-01-05 b 30 ## 256 2019-12-30 j 60 ## 257 2020-01-09 q 0 ## 258 2019-12-15 m 60 ## 259 2019-12-23 t 40 ## 260 2020-01-08 x 70 ## 261 2019-12-15 c 0 ## 262 2019-12-27 n 100 ## 263 2020-01-07 q 90 ## 264 2020-01-01 m 80 ## 265 2019-12-31 g 60 ## 266 2020-01-07 j 100 ## 267 2019-12-27 b 50 ## 268 2020-01-08 j 60 ## 269 2019-12-27 o 90 ## 270 2019-12-24 r 80 ## 271 2019-12-18 e 90 ## 272 2019-12-14 k 100 ## 273 2020-01-08 o 50 ## 274 2019-12-22 l 80 ## 275 2020-01-10 z 90 ## 276 2020-01-10 i 0 ## 277 2020-01-09 m 40 ## 278 2019-12-23 l 100 ## 279 2020-01-05 r 10 ## 280 2019-12-21 t 70 ## 281 2020-01-08 c 10 ## 282 2019-12-17 o 70 ## 283 2019-12-30 b 0 ## 284 2019-12-13 i 30 ## 285 2020-01-01 e 90 ## 286 2019-12-16 l 30 ## 287 2019-12-21 c 80 ## 288 2020-01-01 l 0 ## 289 2020-01-10 e 90 ## 290 2020-01-04 x 70 ## 291 2019-12-16 w 80 ## 292 2019-12-18 k 40 ## 293 2019-12-26 j 90 ## 294 2020-01-06 o 60 ## 295 2020-01-04 s 100 ## 296 2019-12-30 t 0 ## 297 2019-12-12 w 70 ## 298 2020-01-02 d 50 ## 299 2019-12-26 p 80 ## 300 2019-12-25 o 20 ## 301 2019-12-16 t 90 ## 302 2019-12-19 t 0 ## 303 2019-12-17 b 20 ## 304 2020-01-04 g 30 ## 305 2020-01-06 k 20 ## 306 2019-12-12 g 40 ## 307 2019-12-23 q 10 ## 308 2019-12-16 w 30 ## 309 2019-12-17 n 60 ## 310 2019-12-24 e 100 ## 311 2020-01-07 i 40 ## 312 2019-12-14 n 70 ## 313 2020-01-05 g 80 ## 314 2019-12-15 f 50 ## 315 2020-01-06 m 40 ## 316 2019-12-22 b 90 ## 317 2019-12-29 h 40 ## 318 2019-12-26 q 30 ## 319 2019-12-24 m 20 ## 320 2020-01-05 i 20 ## 321 2020-01-04 t 50 ## 322 2020-01-03 c 40 ## 323 2019-12-30 d 30 ## 324 2019-12-17 d 90 ## 325 2019-12-23 y 30 ## 326 2019-12-31 l 60 ## 327 2020-01-07 c 10 ## 328 2020-01-04 d 100 ## 329 2019-12-13 e 20 ## 330 2019-12-14 n 100 ## 331 2019-12-18 a 50 ## 332 2019-12-24 g 40 ## 333 2019-12-18 f 60 ## 334 2020-01-04 z 100 ## 335 2019-12-21 k 100 ## 336 2020-01-01 y 60 ## 337 2019-12-21 b 70 ## 338 2020-01-09 g 10 ## 339 2019-12-13 a 50 ## 340 2019-12-16 f 20 ## 341 2019-12-12 k 90 ## 342 2019-12-16 e 60 ## 343 2020-01-09 l 60 ## 344 2019-12-31 h 70 ## 345 2019-12-21 y 70 ## 346 2019-12-14 z 90 ## 347 2020-01-10 j 20 ## 348 2019-12-15 x 70 ## 349 2019-12-24 o 80 ## 350 2019-12-30 j 70 ## 351 2019-12-16 r 70 ## 352 2020-01-06 d 10 ## 353 2019-12-30 u 100 ## 354 2019-12-23 g 70 ## 355 2019-12-19 o 0 ## 356 2019-12-25 p 90 ## 357 2020-01-03 d 80 ## 358 2020-01-02 s 60 ## 359 2019-12-31 k 20 ## 360 2019-12-12 b 80 ## 361 2019-12-13 n 100 ## 362 2019-12-15 x 60 ## 363 2019-12-19 j 0 ## 364 2019-12-13 w 60 ## 365 2020-01-01 i 10 ## 366 2020-01-07 a 40 ## 367 2019-12-19 w 100 ## 368 2019-12-18 a 30 ## 369 2019-12-17 l 40 ## 370 2020-01-09 d 40 ## 371 2020-01-04 p 90 ## 372 2020-01-08 g 20 ## 373 2019-12-31 r 70 ## 374 2020-01-08 e 20 ## 375 2020-01-08 x 80 ## 376 2020-01-10 x 0 ## 377 2019-12-31 f 20 ## 378 2020-01-10 v 10 ## 379 2019-12-25 s 40 ## 380 2019-12-30 v 10 ## 381 2020-01-06 l 100 ## 382 2019-12-26 l 50 ## 383 2019-12-25 r 10 ## 384 2020-01-06 g 0 ## 385 2020-01-01 z 0 ## 386 2019-12-18 v 90 ## 387 2020-01-05 p 10 ## 388 2019-12-17 u 90 ## 389 2019-12-21 a 90 ## 390 2020-01-10 i 80 ## 391 2019-12-14 q 10 ## 392 2019-12-21 w 30 ## 393 2019-12-22 b 40 ## 394 2019-12-25 u 70 ## 395 2019-12-18 i 20 ## 396 2019-12-14 a 100 ## 397 2019-12-29 s 10 ## 398 2019-12-19 n 80 ## 399 2019-12-15 n 100 ## 400 2019-12-21 f 0 ## 401 2020-01-01 x 100 ## 402 2020-01-03 z 20 ## 403 2020-01-05 v 20 ## 404 2019-12-18 m 30 ## 405 2019-12-26 y 0 ## 406 2020-01-10 f 50 ## 407 2019-12-21 j 80 ## 408 2019-12-17 p 50 ## 409 2020-01-01 p 90 ## 410 2020-01-06 y 50 ## 411 2020-01-05 a 50 ## 412 2020-01-01 y 90 ## 413 2020-01-04 z 30 ## 414 2019-12-31 h 10 ## 415 2019-12-15 m 40 ## 416 2019-12-21 i 100 ## 417 2019-12-15 f 80 ## 418 2019-12-15 f 90 ## 419 2019-12-14 q 50 ## 420 2019-12-15 j 70 ## 421 2020-01-01 f 70 ## 422 2019-12-17 i 80 ## 423 2019-12-20 t 100 ## 424 2019-12-31 t 0 ## 425 2020-01-03 z 50 ## 426 2020-01-09 m 40 ## 427 2019-12-24 g 40 ## 428 2019-12-17 g 30 ## 429 2019-12-18 m 0 ## 430 2019-12-30 y 10 ## 431 2020-01-09 o 60 ## 432 2020-01-03 y 90 ## 433 2019-12-24 h 50 ## 434 2019-12-17 m 70 ## 435 2020-01-06 v 80 ## 436 2019-12-21 h 90 ## 437 2020-01-05 e 10 ## 438 2019-12-28 a 70 ## 439 2019-12-14 w 10 ## 440 2019-12-15 n 50 ## 441 2019-12-15 u 0 ## 442 2019-12-28 b 0 ## 443 2019-12-24 g 20 ## 444 2019-12-20 v 80 ## 445 2019-12-13 v 20 ## 446 2019-12-23 x 20 ## 447 2019-12-12 a 90 ## 448 2019-12-27 b 90 ## 449 2019-12-22 x 0 ## 450 2020-01-07 e 90 ## 451 2019-12-16 y 0 ## 452 2019-12-24 d 10 ## 453 2019-12-26 b 50 ## 454 2020-01-06 k 90 ## 455 2019-12-13 u 70 ## 456 2019-12-23 y 70 ## 457 2019-12-24 a 80 ## 458 2019-12-21 c 90 ## 459 2020-01-04 n 70 ## 460 2020-01-01 l 70 ## 461 2019-12-13 q 80 ## 462 2020-01-02 d 70 ## 463 2019-12-21 u 30 ## 464 2019-12-16 n 70 ## 465 2019-12-16 d 10 ## 466 2020-01-01 t 10 ## 467 2019-12-25 v 0 ## 468 2019-12-15 b 30 ## 469 2019-12-31 b 30 ## 470 2019-12-15 y 10 ## 471 2019-12-29 z 0 ## 472 2019-12-18 f 30 ## 473 2019-12-22 g 10 ## 474 2019-12-20 x 20 ## 475 2019-12-12 c 30 ## 476 2019-12-16 c 50 ## 477 2019-12-12 a 10 ## 478 2020-01-04 a 100 ## 479 2020-01-08 y 20 ## 480 2020-01-06 d 0 ## 481 2019-12-27 q 60 ## 482 2019-12-31 z 50 ## 483 2019-12-24 x 30 ## 484 2020-01-09 u 20 ## 485 2019-12-17 s 10 ## 486 2020-01-02 f 40 ## 487 2020-01-09 l 10 ## 488 2020-01-08 g 70 ## 489 2019-12-22 r 70 ## 490 2020-01-05 v 90 ## 491 2019-12-18 f 90 ## 492 2019-12-13 f 30 ## 493 2019-12-12 x 10 ## 494 2019-12-14 h 70 ## 495 2020-01-03 v 60 ## 496 2019-12-20 q 10 ## 497 2020-01-02 b 20 ## 498 2019-12-27 c 30 ## 499 2020-01-02 b 20 ## 500 2019-12-17 d 0 ## 501 2020-01-06 h 50 ## 502 2020-01-03 e 0 ## 503 2019-12-28 a 100 ## 504 2019-12-29 q 30 ## 505 2019-12-13 r 30 ## 506 2019-12-29 s 30 ## 507 2020-01-04 i 50 ## 508 2019-12-27 x 30 ## 509 2019-12-25 f 20 ## 510 2019-12-29 h 100 ## 511 2019-12-27 v 10 ## 512 2020-01-08 t 50 ## 513 2019-12-17 r 90 ## 514 2020-01-08 f 20 ## 515 2019-12-13 r 40 ## 516 2019-12-16 r 80 ## 517 2020-01-08 m 20 ## 518 2019-12-16 m 0 ## 519 2019-12-31 d 0 ## 520 2019-12-21 a 30 ## 521 2019-12-25 k 100 ## 522 2019-12-20 q 90 ## 523 2020-01-06 w 0 ## 524 2020-01-05 t 70 ## 525 2019-12-25 h 90 ## 526 2019-12-26 y 70 ## 527 2019-12-23 q 90 ## 528 2019-12-14 c 90 ## 529 2019-12-13 l 10 ## 530 2020-01-07 r 40 ## 531 2019-12-19 p 90 ## 532 2020-01-07 h 20 ## 533 2020-01-04 k 50 ## 534 2019-12-26 y 100 ## 535 2020-01-06 t 0 ## 536 2019-12-20 p 10 ## 537 2019-12-23 q 40 ## 538 2019-12-28 c 10 ## 539 2019-12-13 t 100 ## 540 2019-12-27 a 80 ## 541 2020-01-04 g 60 ## 542 2020-01-01 s 30 ## 543 2019-12-28 j 100 ## 544 2019-12-13 t 0 ## 545 2019-12-29 n 30 ## 546 2019-12-13 r 40 ## 547 2019-12-14 r 100 ## 548 2019-12-13 v 10 ## 549 2020-01-08 g 10 ## 550 2020-01-08 a 90 ## 551 2019-12-30 o 70 ## 552 2019-12-29 c 0 ## 553 2020-01-02 r 90 ## 554 2019-12-17 e 30 ## 555 2020-01-09 n 30 ## 556 2020-01-10 b 40 ## 557 2019-12-27 e 70 ## 558 2019-12-17 t 50 ## 559 2019-12-20 c 50 ## 560 2020-01-04 n 0 ## 561 2020-01-03 l 70 ## 562 2020-01-05 c 30 ## 563 2019-12-27 h 50 ## 564 2019-12-21 u 90 ## 565 2019-12-19 l 80 ## 566 2019-12-31 l 100 ## 567 2019-12-22 e 100 ## 568 2020-01-08 g 20 ## 569 2019-12-31 d 20 ## 570 2019-12-28 f 60 ## 571 2019-12-12 i 50 ## 572 2019-12-14 m 60 ## 573 2020-01-04 q 90 ## 574 2020-01-05 u 0 ## 575 2020-01-06 z 60 ## 576 2020-01-10 p 100 ## 577 2019-12-20 t 50 ## 578 2019-12-18 y 10 ## 579 2019-12-22 w 90 ## 580 2019-12-21 r 90 ## 581 2020-01-09 o 30 ## 582 2019-12-13 c 60 ## 583 2019-12-25 h 20 ## 584 2019-12-16 d 0 ## 585 2020-01-02 s 10 ## 586 2020-01-10 y 80 ## 587 2020-01-10 w 100 ## 588 2019-12-14 a 10 ## 589 2019-12-24 u 60 ## 590 2019-12-30 x 70 ## 591 2020-01-08 l 40 ## 592 2020-01-01 l 60 ## 593 2019-12-31 e 80 ## 594 2019-12-29 h 60 ## 595 2019-12-29 a 60 ## 596 2019-12-23 u 20 ## 597 2020-01-08 v 10 ## 598 2019-12-27 z 100 ## 599 2020-01-03 m 50 ## 600 2019-12-22 o 30 ## 601 2019-12-13 y 80 ## 602 2020-01-01 t 30 ## 603 2019-12-15 j 40 ## 604 2020-01-03 w 90 ## 605 2019-12-27 a 70 ## 606 2020-01-04 f 60 ## 607 2019-12-17 s 20 ## 608 2020-01-09 b 100 ## 609 2019-12-22 t 50 ## 610 2020-01-09 t 10 ## 611 2019-12-22 z 40 ## 612 2019-12-29 h 0 ## 613 2020-01-02 i 30 ## 614 2019-12-27 e 20 ## 615 2019-12-23 d 50 ## 616 2019-12-17 u 0 ## 617 2020-01-04 x 80 ## 618 2019-12-28 f 30 ## 619 2019-12-18 w 30 ## 620 2020-01-01 n 50 ## 621 2019-12-19 p 80 ## 622 2019-12-30 m 80 ## 623 2019-12-25 e 50 ## 624 2019-12-19 b 10 ## 625 2020-01-02 l 40 ## 626 2019-12-20 r 90 ## 627 2019-12-19 n 0 ## 628 2019-12-22 b 10 ## 629 2020-01-08 q 80 ## 630 2019-12-30 i 50 ## 631 2020-01-03 w 70 ## 632 2019-12-15 s 100 ## 633 2019-12-21 c 40 ## 634 2019-12-13 p 10 ## 635 2019-12-26 k 90 ## 636 2019-12-28 f 70 ## 637 2020-01-10 r 0 ## 638 2019-12-19 m 90 ## 639 2020-01-09 j 60 ## 640 2019-12-13 g 50 ## 641 2019-12-17 g 10 ## 642 2020-01-09 i 50 ## 643 2020-01-03 n 100 ## 644 2019-12-23 s 20 ## 645 2019-12-14 b 90 ## 646 2020-01-08 i 80 ## 647 2019-12-12 t 60 ## 648 2019-12-24 j 60 ## 649 2019-12-23 z 80 ## 650 2020-01-09 v 30 ## 651 2020-01-02 q 40 ## 652 2019-12-25 c 60 ## 653 2020-01-07 w 90 ## 654 2019-12-17 i 40 ## 655 2019-12-13 b 0 ## 656 2019-12-21 i 20 ## 657 2019-12-23 x 40 ## 658 2019-12-19 j 70 ## 659 2019-12-21 i 100 ## 660 2019-12-21 t 90 ## 661 2020-01-09 w 20 ## 662 2019-12-31 v 60 ## 663 2020-01-07 x 70 ## 664 2019-12-25 e 70 ## 665 2019-12-30 q 60 ## 666 2020-01-10 g 50 ## 667 2019-12-17 e 0 ## 668 2020-01-04 t 20 ## 669 2019-12-20 a 100 ## 670 2019-12-28 v 60 ## 671 2019-12-30 c 90 ## 672 2019-12-23 h 70 ## 673 2019-12-25 w 10 ## 674 2020-01-05 o 60 ## 675 2019-12-27 c 90 ## 676 2019-12-31 n 10 ## 677 2019-12-17 y 100 ## 678 2019-12-29 p 10 ## 679 2019-12-21 g 40 ## 680 2020-01-08 w 60 ## 681 2019-12-19 b 50 ## 682 2019-12-15 m 90 ## 683 2020-01-06 c 90 ## 684 2019-12-28 m 10 ## 685 2020-01-02 y 60 ## 686 2020-01-05 d 80 ## 687 2019-12-24 n 0 ## 688 2019-12-23 g 0 ## 689 2019-12-22 q 80 ## 690 2019-12-28 s 30 ## 691 2019-12-30 m 50 ## 692 2019-12-29 f 10 ## 693 2019-12-20 v 30 ## 694 2019-12-16 q 80 ## 695 2019-12-30 k 50 ## 696 2020-01-09 d 0 ## 697 2020-01-03 w 60 ## 698 2019-12-25 m 80 ## 699 2019-12-20 f 80 ## 700 2019-12-17 l 40 ## 701 2019-12-25 d 100 ## 702 2019-12-20 f 90 ## 703 2019-12-28 w 100 ## 704 2019-12-13 c 60 ## 705 2020-01-08 e 90 ## 706 2020-01-02 q 30 ## 707 2019-12-24 m 50 ## 708 2019-12-28 i 100 ## 709 2020-01-09 s 0 ## 710 2020-01-05 z 30 ## 711 2020-01-08 w 50 ## 712 2020-01-02 d 0 ## 713 2019-12-29 i 90 ## 714 2019-12-14 f 0 ## 715 2019-12-26 s 20 ## 716 2019-12-19 p 20 ## 717 2020-01-10 h 80 ## 718 2019-12-13 k 90 ## 719 2019-12-25 i 0 ## 720 2020-01-10 i 80 ## 721 2020-01-02 r 70 ## 722 2020-01-10 t 90 ## 723 2019-12-24 v 0 ## 724 2019-12-28 x 10 ## 725 2019-12-24 t 60 ## 726 2019-12-16 a 20 ## 727 2020-01-03 r 70 ## 728 2020-01-07 n 50 ## 729 2019-12-28 j 10 ## 730 2019-12-12 r 20 ## 731 2019-12-21 w 0 ## 732 2020-01-06 u 70 ## 733 2019-12-28 e 100 ## 734 2019-12-12 i 70 ## 735 2020-01-09 n 80 ## 736 2020-01-05 k 60 ## 737 2019-12-30 d 60 ## 738 2019-12-23 y 20 ## 739 2019-12-29 q 70 ## 740 2019-12-23 x 0 ## 741 2019-12-27 l 70 ## 742 2020-01-08 f 40 ## 743 2020-01-06 t 100 ## 744 2019-12-18 c 80 ## 745 2020-01-07 t 40 ## 746 2019-12-25 h 60 ## 747 2019-12-25 z 50 ## 748 2020-01-09 l 30 ## 749 2020-01-08 w 50 ## 750 2020-01-06 r 70 ## 751 2019-12-16 g 70 ## 752 2019-12-14 b 40 ## 753 2019-12-28 x 10 ## 754 2020-01-09 q 30 ## 755 2020-01-05 l 20 ## 756 2020-01-01 f 100 ## 757 2020-01-02 l 30 ## 758 2020-01-04 n 10 ## 759 2019-12-30 x 80 ## 760 2020-01-08 z 50 ## 761 2019-12-23 u 60 ## 762 2020-01-05 l 70 ## 763 2019-12-21 g 20 ## 764 2019-12-16 r 40 ## 765 2020-01-10 z 10 ## 766 2020-01-02 u 50 ## 767 2019-12-26 q 30 ## 768 2019-12-12 h 70 ## 769 2019-12-21 k 80 ## 770 2020-01-05 p 40 ## 771 2019-12-23 l 10 ## 772 2019-12-29 k 20 ## 773 2020-01-08 b 40 ## 774 2020-01-05 p 40 ## 775 2019-12-29 g 100 ## 776 2019-12-14 d 90 ## 777 2019-12-26 r 40 ## 778 2019-12-16 t 10 ## 779 2020-01-10 w 20 ## 780 2019-12-17 k 100 ## 781 2019-12-28 m 80 ## 782 2019-12-26 y 100 ## 783 2019-12-22 h 90 ## 784 2020-01-03 e 90 ## 785 2020-01-06 f 80 ## 786 2019-12-12 q 0 ## 787 2019-12-27 e 90 ## 788 2020-01-05 c 30 ## 789 2019-12-16 f 80 ## 790 2019-12-27 h 0 ## 791 2019-12-19 h 100 ## 792 2019-12-24 n 100 ## 793 2019-12-25 s 90 ## 794 2020-01-10 s 100 ## 795 2019-12-19 b 70 ## 796 2019-12-13 p 80 ## 797 2019-12-22 a 40 ## 798 2020-01-03 v 80 ## 799 2019-12-24 f 0 ## 800 2019-12-14 x 40 ## 801 2019-12-15 g 80 ## 802 2019-12-16 h 100 ## 803 2019-12-17 a 10 ## 804 2019-12-31 g 40 ## 805 2019-12-12 y 10 ## 806 2020-01-04 z 70 ## 807 2019-12-31 q 60 ## 808 2019-12-19 t 30 ## 809 2019-12-22 c 40 ## 810 2019-12-29 d 100 ## 811 2019-12-20 k 0 ## 812 2019-12-16 a 30 ## 813 2020-01-07 o 60 ## 814 2019-12-26 k 20 ## 815 2019-12-26 o 100 ## 816 2020-01-03 o 50 ## 817 2019-12-26 z 20 ## 818 2019-12-23 l 10 ## 819 2020-01-02 f 30 ## 820 2019-12-28 r 70 ## 821 2020-01-07 a 10 ## 822 2019-12-13 z 100 ## 823 2019-12-22 m 100 ## 824 2020-01-07 r 0 ## 825 2019-12-28 y 40 ## 826 2020-01-04 u 60 ## 827 2019-12-27 b 60 ## 828 2019-12-26 j 50 ## 829 2019-12-23 h 80 ## 830 2019-12-29 c 50 ## 831 2019-12-24 b 0 ## 832 2019-12-23 w 10 ## 833 2020-01-08 o 0 ## 834 2019-12-12 v 40 ## 835 2019-12-19 j 90 ## 836 2019-12-16 t 80 ## 837 2019-12-23 f 70 ## 838 2019-12-28 g 70 ## 839 2019-12-28 a 80 ## 840 2019-12-24 y 100 ## 841 2019-12-17 g 30 ## 842 2020-01-09 r 90 ## 843 2019-12-23 t 60 ## 844 2019-12-19 n 40 ## 845 2019-12-12 u 60 ## 846 2019-12-30 e 100 ## 847 2019-12-21 a 30 ## 848 2020-01-10 y 100 ## 849 2019-12-17 c 100 ## 850 2020-01-09 u 20 ## 851 2020-01-10 u 20 ## 852 2020-01-01 p 10 ## 853 2019-12-20 z 60 ## 854 2020-01-07 g 30 ## 855 2019-12-20 z 70 ## 856 2019-12-14 c 90 ## 857 2019-12-12 s 0 ## 858 2020-01-07 k 60 ## 859 2019-12-19 g 30 ## 860 2019-12-27 e 80 ## 861 2019-12-23 i 10 ## 862 2020-01-04 w 100 ## 863 2019-12-24 n 90 ## 864 2020-01-04 e 0 ## 865 2020-01-02 l 60 ## 866 2020-01-09 c 50 ## 867 2019-12-31 o 30 ## 868 2019-12-25 g 0 ## 869 2019-12-16 o 60 ## 870 2019-12-13 r 10 ## 871 2019-12-25 y 30 ## 872 2020-01-08 t 60 ## 873 2019-12-24 v 30 ## 874 2020-01-07 b 80 ## 875 2020-01-08 z 70 ## 876 2020-01-01 v 40 ## 877 2020-01-09 g 100 ## 878 2019-12-15 r 10 ## 879 2019-12-27 u 40 ## 880 2019-12-18 r 30 ## 881 2020-01-04 y 100 ## 882 2019-12-16 f 60 ## 883 2019-12-22 t 100 ## 884 2020-01-10 l 40 ## 885 2020-01-06 n 50 ## 886 2019-12-20 d 100 ## 887 2019-12-18 h 100 ## 888 2019-12-19 d 50 ## 889 2019-12-28 w 60 ## 890 2019-12-27 z 10 ## 891 2020-01-01 d 60 ## 892 2019-12-27 l 20 ## 893 2019-12-22 t 20 ## 894 2020-01-07 n 30 ## 895 2020-01-09 g 30 ## 896 2019-12-26 z 50 ## 897 2019-12-25 i 50 ## 898 2020-01-08 q 60 ## 899 2019-12-15 v 90 ## 900 2019-12-25 q 30 ## 901 2019-12-19 k 60 ## 902 2019-12-26 p 90 ## 903 2019-12-15 n 0 ## 904 2019-12-22 c 30 ## 905 2019-12-24 z 70 ## 906 2019-12-20 t 10 ## 907 2019-12-31 y 90 ## 908 2019-12-15 p 30 ## 909 2020-01-06 n 40 ## 910 2020-01-08 u 50 ## 911 2019-12-25 o 10 ## 912 2020-01-03 d 0 ## 913 2020-01-08 u 100 ## 914 2020-01-08 k 0 ## 915 2019-12-14 x 30 ## 916 2019-12-20 o 100 ## 917 2020-01-08 r 90 ## 918 2019-12-31 d 50 ## 919 2019-12-19 w 100 ## 920 2019-12-22 x 30 ## 921 2020-01-06 k 70 ## 922 2020-01-02 o 90 ## 923 2019-12-19 x 90 ## 924 2020-01-06 x 70 ## 925 2019-12-13 v 90 ## 926 2019-12-30 i 80 ## 927 2019-12-17 t 10 ## 928 2019-12-15 z 40 ## 929 2020-01-01 v 50 ## 930 2020-01-01 j 40 ## 931 2020-01-07 y 0 ## 932 2020-01-08 m 100 ## 933 2020-01-03 u 20 ## 934 2019-12-16 o 90 ## 935 2019-12-14 d 20 ## 936 2020-01-01 z 80 ## 937 2019-12-30 n 80 ## 938 2019-12-26 z 80 ## 939 2019-12-28 x 60 ## 940 2019-12-22 v 30 ## 941 2019-12-21 p 40 ## 942 2020-01-06 o 40 ## 943 2019-12-28 t 80 ## 944 2019-12-27 k 50 ## 945 2019-12-27 m 60 ## 946 2020-01-03 e 10 ## 947 2020-01-04 t 10 ## 948 2019-12-16 q 10 ## 949 2019-12-20 w 40 ## 950 2019-12-30 h 0 ## 951 2020-01-04 l 40 ## 952 2020-01-07 r 100 ## 953 2019-12-24 y 70 ## 954 2019-12-17 p 80 ## 955 2020-01-05 n 90 ## 956 2019-12-19 b 100 ## 957 2020-01-03 a 40 ## 958 2020-01-09 o 100 ## 959 2020-01-02 x 70 ## 960 2019-12-14 n 0 ## 961 2019-12-27 q 30 ## 962 2019-12-27 m 60 ## 963 2020-01-08 y 20 ## 964 2020-01-04 g 0 ## 965 2019-12-22 f 80 ## 966 2020-01-05 l 40 ## 967 2020-01-01 d 20 ## 968 2019-12-24 x 40 ## 969 2019-12-12 v 40 ## 970 2019-12-20 d 70 ## 971 2020-01-07 m 0 ## 972 2020-01-04 h 20 ## 973 2019-12-12 h 40 ## 974 2020-01-08 t 30 ## 975 2020-01-05 y 90 ## 976 2019-12-29 t 40 ## 977 2019-12-16 o 80 ## 978 2020-01-03 f 20 ## 979 2019-12-17 f 80 ## 980 2019-12-15 g 20 ## 981 2019-12-23 u 60 ## 982 2019-12-30 v 100 ## 983 2019-12-18 f 80 ## 984 2019-12-24 r 0 ## 985 2019-12-21 r 90 ## 986 2019-12-18 t 20 ## 987 2019-12-28 b 80 ## 988 2019-12-23 v 0 ## 989 2019-12-14 a 0 ## 990 2020-01-03 h 80 ## 991 2019-12-19 a 0 ## 992 2019-12-13 s 60 ## 993 2019-12-25 j 0 ## 994 2019-12-15 j 40 ## 995 2019-12-24 g 60 ## 996 2019-12-31 u 20 ## 997 2019-12-16 n 20 ## 998 2019-12-26 f 50 ## 999 2019-12-22 u 20 ## 1000 2019-12-21 t 20 ``` --- - Warns you when you make mistakes (!): ```r tib1$quanitty ``` <pre style="color: red;"><code>## Warning: Unknown or uninitialised column: 'quanitty'. </code></pre> ``` ## NULL ``` ```r df1$quanitty ``` ``` ## NULL ``` --- - Can also create via `tribble()`: ```r tribble( ~a, ~b, ~c, "a", 1, 2.2, "b", 2, 4.3, "c", 3, 3.4 ) ``` ``` ## # A tibble: 3 x 3 ## a b c ## <chr> <dbl> <dbl> ## 1 a 1 2.2 ## 2 b 2 4.3 ## 3 c 3 3.4 ``` --- - Can build on top variables during creation: ```r tibble(x = 1:5, y = x^2) ``` ``` ## # A tibble: 5 x 2 ## x y ## <int> <dbl> ## 1 1 1 ## 2 2 4 ## 3 3 9 ## 4 4 16 ## 5 5 25 ``` ```r data.frame(x = 1:5, y = x^2) ``` <pre style="color: red;"><code>## Error in data.frame(x = 1:5, y = x^2): object 'x' not found </code></pre> --- - Will never turn your strings into factors, will never change your column names: ```r tib1 <- readr::read_csv("../data/bad_args_test.csv", col_types = cols()) colnames(tib1) ``` ``` ## [1] "col1" "col2" "col3" ``` ```r tib1$col3 ``` ``` ## [1] "a" "b" "c" "d" ``` ```r df1 <- read.csv("../data/bad_args_test.csv") colnames(df1) ``` ``` ## [1] "ï..col1" "col2" "col3" ``` ```r df1$col3 ``` ``` ## [1] a b c d ## Levels: a b c d ``` --- Though one ought to remember a `tibble` is still a `data.frame`: ```r class(tib1) ``` ``` ## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame" ``` ```r class(df1) ``` ``` ## [1] "data.frame" ``` --- class: section-slide # End of Detour --- ## Consistency - Example: `stringr` > a cohesive set of functions designed to make working with strings as easy as possible. ```r strings_vec <- c("I'm feeling fine", "I'm perfectly OK", "Nothing is wrong!") str_length(strings_vec) ``` ``` ## [1] 16 16 17 ``` ```r str_c(strings_vec, collapse = ", ") ``` ``` ## [1] "I'm feeling fine, I'm perfectly OK, Nothing is wrong!" ``` ```r str_sub(strings_vec, 1, 3) ``` ``` ## [1] "I'm" "I'm" "Not" ``` --- ```r str_detect(strings_vec, "I'm") ``` ``` ## [1] TRUE TRUE FALSE ``` ```r str_replace(strings_vec, "I'm", "You're") ``` ``` ## [1] "You're feeling fine" "You're perfectly OK" "Nothing is wrong!" ``` ```r str_split("Do you know regex?", " ") ``` ``` ## [[1]] ## [1] "Do" "you" "know" "regex?" ``` ```r str_extract(strings_vec, "[aeiou]") ``` ``` ## [1] "e" "e" "o" ``` ```r str_count(strings_vec, "[A-Z]") ``` ``` ## [1] 1 3 1 ``` --- ## The Pipe Remember you? ```r mean_bi <- mean(okcupid$income[okcupid$sex == "f" & okcupid$age > 30 & okcupid$orientation == "bisexual"], na.rm = TRUE) mean_gay <- mean(okcupid$income[okcupid$sex == "f" & okcupid$age > 30 & okcupid$orientation == "gay"], na.rm = TRUE) mean_straight <- mean(okcupid$income[okcupid$sex == "f" & okcupid$age > 30 & okcupid$orientation == "straight"], na.rm = TRUE) data.frame(orientation = c("bisexual", "gay", "straight"), income_mean = c(mean_bi, mean_gay, mean_straight)) ``` ``` ## orientation income_mean ## 1 bisexual 133421.05 ## 2 gay 86489.36 ## 3 straight 85219.74 ``` --- Doesn't this make much more sense? ```r okcupid %>% filter(sex == "f", age > 30) %>% group_by(orientation) %>% summarize(income_mean = mean(income, na.rm = TRUE)) ``` ``` ## # A tibble: 3 x 2 ## orientation income_mean ## <chr> <dbl> ## 1 bisexual 133421. ## 2 gay 86489. ## 3 straight 85220. ``` - Read as: - Take the OKCupid data, - Filter only women above the age of 30, - And for each group of sexual orientation, - Give me the average income --- - Make verbs, not nouns - Can always access the dataset last stage with "`.`": ```r okcupid %>% filter(str_count(essay0) > median(str_count(.$essay0), na.rm = T)) ``` - Operates not just on data frames or tibbles: ```r strings_vec %>% str_to_title() ``` ``` ## [1] "I'm Feeling Fine" "I'm Perfectly Ok" "Nothing Is Wrong!" ``` - No intermediate objects - Don't strive to make the longest possible pipe (though it is a fun experiment) - Tools exist for debugging --- And, if you want to throw in the n, the median: ```r okcupid %>% filter(sex == "f", age > 30) %>% group_by(orientation) %>% summarize(income_mean = mean(income, na.rm = TRUE), income_median = median(income, na.rm = TRUE), n = n()) ``` ``` ## # A tibble: 3 x 4 ## orientation income_mean income_median n ## <chr> <dbl> <dbl> <int> ## 1 bisexual 133421. 50000 652 ## 2 gay 86489. 40000 664 ## 3 straight 85220. 60000 10436 ``` --- And if you want this for the age as well: ```r okcupid %>% filter(sex == "f", age > 30) %>% group_by(orientation) %>% summarize_at(vars(income, age), list(mean = mean, median = median), na.rm = TRUE) ``` ``` ## # A tibble: 3 x 5 ## orientation income_mean age_mean income_median age_median ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 bisexual 133421. 37.8 50000 36 ## 2 gay 86489. 40.4 40000 38 ## 3 straight 85220. 40.7 60000 38 ``` Now *this* is a language for Data Science. But we're getting ahead of ourselves. --- ## `ggplot2` <img src="images/ave_mariah_ggridges.png" style="width: 80%" /> .font80percent[ [Ave Mariah / Giora Simchoni](http://giorasimchoni.com/2017/12/10/2017-12-10-ave-mariah/) ] --- ## `ggplot2` <img src="images/soviet_space_dogs.png" style="width: 50%" /> .font80percent[ [Soviet Space Dogs / David Smale](https://davidsmale.netlify.com/portfolio/soviet-space-dogs-part-2/) ] --- ## `ggplot2` <img src="images/tennisBig4.gif" style="width: 70%" /> .font80percent[ [Federer, Nadal, Djokovic and Murray, Love. / Giora Simchoni](http://giorasimchoni.com/2017/05/01/2017-05-01-federer-nadal-djokovic-and-murray-love/) ] --- ## `ggplot2` <img src="images/washington_heat.png" style="width: 40%" /> .font80percent[ [NYT-style urban heat island maps / Katie Jolly](https://www.katiejolly.io/blog/2019-08-28/nyt-urban-heat) ] --- ## `ggplot2` <img src="images/marriage_by_state.png" style="width: 90%" /> .font80percent[ [A map of marriage rates, state by state / Unkown](https://www.r-graph-gallery.com/328-hexbin-map-of-the-usa.html) ] --- ## `ggplot2` <img src="images/calendar_graph.png" style="width: 70%" /> .font80percent[ [Calendar-based graphics for visualizing people’s daily schedules / Earo Wang](https://pdf.earo.me/calendar-vis.pdf) ] --- ## The Community .pull-left[ - 100% Open Source on Github - Cheatsheet for everything - Documentation for humans, Packages websites, Webinars, Free Books (start with [R4DS](https://r4ds.had.co.nz/)) - [Rstudio Community forum](https://community.rstudio.com/) - [RLadies](https://rladies.org/) worldwide branches .font80percent[(who will pick up the 🥊 and create RLadies TLV?)] - Very strong on Twitter [#rstats](https://twitter.com/search?q=%23rstats) ] .pull-right[ <a href="https://rstudio.com/resources/cheatsheets/"><img src="images/stringr_cheatsheet.png" style="width: 100%" /></a> ]