# 96 A brief instruction for the half semester of EDAV5702

Siyuan Sang

## 96.2 The packages you should get familiar with

library(ggplot2)
library(tidyverse)

## 96.3 Pay attention to the rules learned in class

#### 96.3.0.2 My code goes like this:

library(openintro)
ggplot(loans_full_schema, aes(x =loan_purpose, y = loan_amount)) +
geom_boxplot() +
coord_flip()

#### 96.3.0.3 While the correct one should be:

library(openintro)
ggplot(loans_full_schema, aes(x = fct_reorder(loan_purpose, loan_amount), y = loan_amount)) +
geom_boxplot() +
coord_flip()

## 96.4 Try your best to imporve the graph

#### 96.4.0.1 As a course of data vitualization, in my view, drawing a graph should never be the end of the story. There’re many ways to make your graphs look more clear and refined. Here I give an example of the scatterplot of housing price.

cor_df <- ames %>%
group_by(Neighborhood) %>%
summarize(cor = cor(area, price), beta = lm(price~area)\$coefficients[2], meanprice = mean(price)) %>%
ungroup() %>%
arrange(beta)
ggplot(cor_df, aes(meanprice, beta)) +
geom_point() +
geom_smooth(method="lm", se= FALSE)

#### 96.4.0.2 It’s now just a set of points with a line in this graph. Now I want to improve it. Considering our goal to make an analysis. It can be a good idea to put the names of Neighborhoods inside.

ggplot(cor_df, aes(meanprice, beta, label = Neighborhood)) +
geom_point() +
geom_text(nudge_y = 10, size = 3) +
geom_smooth(method="lm", se= FALSE)
#### In this graph, although a lot names are in a mess, we can easily figure out the Neighborhood name of the oulier, which is an observable features of the dataset.