• cc for EDAV 2020
  • 1 Instructions
    • 1.1 Background
    • 1.2 Preparing your .Rmd file
    • 1.3 Submission steps
    • 1.4 Optional tweaks
    • 1.5 FAQ
      • 1.5.1 What should I expect after creating a pull request?
      • 1.5.2 What if I catch mistakes before my pull request is merged?
      • 1.5.3 What if I catch mistakes after my pull request is merged?
      • 1.5.4 Other questions
  • 2 Sample project
  • I Data Processing and Wrangling
  • 3 Data transformation in R
    • 3.1 Introduction
    • 3.2 Basics
    • 3.3 Function Usage
      • 3.3.1 1. Filter( )
      • 3.3.2 2. Arrange( )
      • 3.3.3 3. Select( )
      • 3.3.4 4. Mutate( )
      • 3.3.5 5. Group_by( ) and Summarize( )
    • 3.4 Quick EDA: median arrival delays
    • 3.5 External Resource
  • 4 Data Preprocessing and Feature Engineering in R
    • 4.1 Overview
    • 4.2 Missing Values
      • 4.2.1 Exploring the dataset
      • 4.2.2 Handling Missing Values and Imputation
    • 4.3 Feature Selection
    • 4.4 Dimentionality Reduction
      • 4.4.1 Principal Component Analysis (PCA)
  • 5 Apply family
    • 5.0.1 apply
    • 5.0.2 Lapply
    • 5.0.3 Sapply
    • 5.0.4 tapply
  • 6 Writing SQL in R
    • 6.1 1. Introduction
    • 6.2 2. What is SQL?
    • 6.3 3. R Packages
      • 6.3.1 Load packages
      • 6.3.2 Dataset
    • 6.4 sqldf
      • 6.4.1 SQL Queries
      • 6.4.2 Where/And/Or
      • 6.4.3 More sqldf Examples
      • 6.4.4 sqldf Summary:
    • 6.5 DPLYR
      • 6.5.1 DPLYR Queries
      • 6.5.2 More Examples
      • 6.5.3 DPLYR Summary
    • 6.6 DBI
      • 6.6.1 DBI Queries:
    • 6.7 Conclusion
    • 6.8 Work Cited:
  • II Data Visualization
  • 7 Customized Plot Matrix: pairs and ggpairs
    • 7.1 Overview: Things we can do with pairs() and ggpairs()
    • 7.2 Scatterplot matrix for continuous variables
      • 7.2.1 Plot with pairs()
      • 7.2.2 Plot with ggpairs() from GGally package
    • 7.3 Categorical variables
      • 7.3.1 pairs() plot for categorical variables
      • 7.3.2 Highlight correlations with shade colors
    • 7.4 Outside sources
  • 8 A brief introduction to seaborn
  • 9 GeomMLBStadiums
    • 9.1 Overview
    • 9.2 Setup
    • 9.3 geom_mlb_stadium
    • 9.4 geom_spraychart
      • 9.4.1 mlbam_xy_transformation
      • 9.4.2 Spraycharts
    • 9.5 Other types of Visualizations with GeomMLBStadiums
      • 9.5.1 Heatmaps with GeomMLBStadiums
      • 9.5.2 stat_summary_hex with GeomMLBStadiums
  • 10 Radar Chart
    • 10.1 Introduction of Radar Chart
    • 10.2 Basic Radar Chart
    • 10.3 Customize Radar Chart
    • 10.4 Radar Chart with several individuals
      • 10.4.1 basic plot
      • 10.4.2 customize
      • 10.4.3 Radar vs. Bar Chart
      • 10.4.4 Using package ‘ggradar’
    • 10.5 Reference
  • 11 Likert Scale: Definition, Examples, and Visualization
    • 11.1 What is Likert Scale?
    • 11.2 Test Your Understanding of Likert Scale
    • 11.3 Visualization of a Sample Likert Scale Data: Explore Climate Change in the American Mind
      • 11.3.1 Overview of the Data
      • 11.3.2 Stacked Bar Chart
      • 11.3.3 Faceted Bar Charts
      • 11.3.4 Diverging Stacked Bar Chart
    • 11.4 Arguments on each type of charts
    • 11.5 Reference
  • 12 A simple way to visualize geographic data
    • 12.1 Overview
    • 12.2 Introduction
      • 12.2.1 Load Packages
      • 12.2.2 Boarder Visualization
      • 12.2.3 Boarders
      • 12.2.4 Parts of the Map
    • 12.3 Application
      • 12.3.1 Data Collection
      • 12.3.2 Comparison
    • 12.4 Conclusion
  • 13 d3.js Intereactive Choropleth Map
  • 14 Introduction to violin plots
    • 14.1 Overview
    • 14.2 Theory
    • 14.3 Basic violin plot
    • 14.4 Grouped violin plot
    • 14.5 Grouped violin plot with split violins
    • 14.6 violin plot versus box plot
  • 15 Base R vs. ggplot2
  • III Results Reporting
  • 16 Visualiztion tools in real work
  • 17 Color in R
    • 17.1 Overview
      • 17.1.1 Sections
    • 17.2 Discrete Colors
      • 17.2.1 Basics
      • 17.2.2 RColorBrewer
      • 17.2.3 Viridis
      • 17.2.4 Color Picker
    • 17.3 Continuous Colors
      • 17.3.1 Barplot
      • 17.3.2 Heatmap
      • 17.3.3 Diverging Scales
      • 17.3.4 ColorBlind
    • 17.4 Resources
  • 18 Illustrate commonly used graphs in R
    • 18.1 Numerical Variables with Abalone and Mtcars
    • 18.2 Categorical Graphs with Arthritis Data
  • 19 Plotting interactive graphs with Shiny
  • IV Complete Analyses
  • 20 China choropleth map
    • 20.1 Overview
    • 20.2 Data Collection
    • 20.3 Static Map with ggplot
      • 20.3.1 Prepare the shape data of China
      • 20.3.2 Plot:
    • 20.4 Interactive Map with hchinamap
    • 20.5 Interactive Map with leafletCN
    • 20.6 Conclusion
  • 21 Benford’s Law Analysis of 2020 Election
  • 22 Visualizing electoral margins in US presidential elections (2008-2016)
  • 23 Survival analysis examples
    • 23.1 Introduction
    • 23.2 Basic Concepts
    • 23.3 Examples
      • 23.3.1 Preparation
      • 23.3.2 Overall Survival analysis
      • 23.3.3 Survival analysis between groups
      • 23.3.4 Log-rank test
      • 23.3.5 Cox model
  • V Cheatsheets and Video Tutorials
  • 24 Mosaic plot cheatsheet
  • 25 Time Series Cheatsheet
  • 26 Python vs. R Cheatsheet
  • 27 likert data cheat sheet
    • 27.1 link is below
  • 28 Categorical Data Visualization Cheat Sheet
  • 29 Video Introduction to the Replication Crisis in Psychology
  • 30 A Video Tutorial for Creating a Data Story in Tableau
    • 30.1 Introduction
    • 30.2 Data Source
  • VI Live Tutorials
  • 31 Tableau intro tutorial explained with proset2
  • 32 Solutioning a Data pipeline for visualization
  • 33 Visualization Research in Biomedical Informatics
    • 33.1 Community Contribution Zoom Session
      • 33.1.1 Designing an Interactive Visualization Tool for Understanding & Caring for a Complex Chronic Condition
      • 33.1.2 Phendo App & Prototype of Interactive Visualization
      • 33.1.3 Resources for Human-Centered Visualization & Computing
  • 34 American History Trivia Night
  • 35 Consuming terrorism statistics
  • 36 Introduction to network analysis
  • 37 Some techniques for label setting
    • 37.1 1. Label placement
      • 37.1.1 Tip 1: Adjusting parameters in geom_text like hjust, vjust, nudge_x and nudge_y
      • 37.1.2 Tip 2: ggrepel
      • 37.1.3 Tip 3: directlabels
    • 37.2 2. Auto wrapping of long labels on axis
    • 37.3 3. Add shadow effect to labels
  • VII Translations
  • 38 Chinese Translation of Candela Package
    • 38.1 Candela
      • 38.1.1 开始
    • 38.2 Candela 包使用
      • 38.2.1 安装
      • 38.2.2 版本控制
    • 38.3 组件
      • 38.3.1 条形图 (BarChart)
      • 38.3.2 箱线图 (BoxPlot)
      • 38.3.3 甘特图 (GanttChart)
      • 38.3.4 地理空间图 (Geo)
      • 38.3.5 地理点图(GeoDots)
      • 38.3.6 图级操作 (GLO - Graph-Level Operations)
      • 38.3.7 直方图 (Histogram)
      • 38.3.8 折线图 (LineChart)
      • 38.3.9 排序图(Lineup)
      • 38.3.10 OnSet组件 (OnSet)
      • 38.3.11 散点图 (ScatterPlot)
      • 38.3.12 散点图矩阵 (ScatterPlotMatrix)
      • 38.3.13 词树 (SentenTree)
      • 38.3.14 相似图 (SimilarityGraph)
      • 38.3.15 树状图(Tree Heatmap)
      • 38.3.16 集合可视化 (UpSet)
    • 38.4 API文件
      • 38.4.1 Candela JavaScript API
      • 38.4.2 Candela Python API
      • 38.4.3 Candela R API
    • 38.5 开发人员文件
      • 38.5.1 编码规范指南
      • 38.5.2 创建Candela代码发布
      • 38.5.3 测试
  • 39 Greek translation of edav.info/histo
    • 39.1 Διάγραμμα: Ιστόγραμμα
    • 39.2 Επισκόπηση
    • 39.3 Σύνοψη
    • 39.4 Simple examples
      • 39.4.1 Ιστόγραμμα με χρήση βασικής R
      • 39.4.2 Ιστόγραμμα με χρήση ggplot2
    • 39.5 Θεωρία
    • 39.6 Τύποι ιστογραμμάτων
      • 39.6.1 Συχνότητα ή μέτρηση
      • 39.6.2 Ιστόγραμμα σχετικής συχνότητας
      • 39.6.3 Ιστόγραμμα συνολικής συχνότητας
      • 39.6.4 Πυκνότητα
    • 39.7 Παράμετροι
      • 39.7.1 Όρια ζωνών
      • 39.7.2 Αριθμός ζωνών
      • 39.7.3 Ευθυγράμμιση ζωνών
    • 39.8 Διαδραστικά ιστογράμματα με το ggvis
      • 39.8.1 Διαδραστική αλλαγή του εύρου ζώνης
      • 39.8.2 Παράδειγμα ΑΕΠ
      • 39.8.3 Διαδραστική αλλαγή κέντρου
      • 39.8.4 Αλλαγή κέντρου (με τις τιμές δεδομένων που εμφανίζονται)
      • 39.8.5 Διαδραστική αλλαγή ορίου
    • 39.9 Εξωτερικές πηγές
  • 40 Korean translation of scatterplot
    • 40.1 개요
    • 40.2 요약
    • 40.3 간단한 예시
      • 40.3.1 base R을 이용한 산점도
      • 40.3.2 ggplot2를 이용한 산점도
    • 40.4 이론
    • 40.5 사용 시기
    • 40.6 고려 사항
      • 40.6.1 겹치는 데이터
      • 40.6.2 스케일링
    • 40.7 변경
      • 40.7.1 등고선
      • 40.7.2 산점도 행렬
    • 40.8 추가 자료
  • 41 Rmarkdown tutorial Chinese translation: Rmarkdown 中文版指南
    • 41.1 1. 概述
      • 41.1.1 1.1 什么是R Markdown?
      • 41.1.2 1.2 工作流程
    • 41.2 2. 入门
      • 41.2.1 2.1. 安装套件
      • 41.2.2 2.2.打开文件
      • 41.2.3 2.3. 输出格式
    • 41.3 3. Markdown语法
    • 41.4 4. 嵌入代码
      • 41.4.1 4.1. 内联代码
      • 41.4.2 4.2. 代码块
      • 41.4.3 4.3. 显示选项
    • 41.5 5. 渲染
  • 42 Beautiful visualization with ggplot2
  • 43 Chinese Web scraping using R package: rvest & httr tutorial
    • 43.1 Summary:
    • 43.2 使用 rvest 包抓取数据
      • 43.2.1 常用函数
      • 43.2.2 节点定位方法
      • 43.2.3 案例一:抓取 CRAN 上所有 R 包的信息
      • 43.2.4 案例二:抓取 stackoverflow 上关于 R 的问题
    • 43.3 使用 httr 包抓取数据
      • 43.3.1 httr 中的常用函数
      • 43.3.2 捕获错误机制
      • 43.3.3 案例三:抓取豆瓣电影 top250
      • 43.3.4 案例四:抓取豆瓣热门电影「动态页面的抓取」
  • 44 Website Sharing Whole Class and 2nd-part of the Website – Mosaic Plot Translation to Chinese
    • 44.1 Three Parts of Our Website
    • 44.2 Links of All Parts of Website
    • 44.3 Codes of 2nd-part of Website
      • 44.3.1 Note for the Available Code
      • 44.3.2 This document is outlined as follows:
      • 44.3.3 Introduction 介绍 (Mosaic plot 2019)
      • 44.3.4 Basic Parts of Mosaic Plots
      • 44.3.5 Part 3
      • 44.3.6 参考文献
  • 45 How to plot grouped data for multivariate (in Chinese)
    • 45.1 数据准备
    • 45.2 分类变量分组制图
      • 45.2.1 筛选特定分组数据及计算频率
      • 45.2.2 分组频率柱状图
      • 45.2.3 加入数据标签(并列柱状图)
      • 45.2.4 加入数据标签(堆积柱状图)
      • 45.2.5 其他类型柱状图
    • 45.3 连续变量分组制图
      • 45.3.1 数据格式
      • 45.3.2 箱型图
      • 45.3.3 小提琴图
      • 45.3.4 点图
      • 45.3.5 带状图
      • 45.3.6 Sina图
      • 45.3.7 带有误差线的均值和中位数图
    • 45.4 总结
      • 45.4.1 可视化已分组的连续变量的分布
      • 45.4.2 创建带有误差线的均值和中位数图
      • 45.4.3 将误差线与小提琴图,点图,线形图和条形图结合起来
  • 46 Translation of visualization related to geo data
  • VIII Other Topics
  • 47 Webscraping Dynamic Content: Rselenium Tutorial
    • 47.1 Introduction and Setup
    • 47.2 Simple Illustrations with Websites
    • 47.3 Other Features
  • 48 Brief Introduction and Tutorial of ggpubr Package
    • 48.1 Installation and loading
    • 48.2 Introduction
    • 48.3 Histogram
    • 48.4 Density plot
    • 48.5 Box plot
    • 48.6 Violin plot
    • 48.7 Bar plot
    • 48.8 Lollipop Chart
    • 48.9 Cleveland dot plot
  • 49 Visualization of geographical maps
    • 49.0.1 1. What is map visualization
    • 49.0.2 2. How to draw a map in R
    • 49.0.3 3. tmap
    • 49.0.4 4.Optional watching: create your own sf object
  • 50 Health datasets for the final project
    • 50.1 Big Cities Health Inventory Data
    • 50.2 MHealth Dataset
    • 50.3 Human Mortality Database (HMD)
    • 50.4 SEER Cancer Incidence
    • 50.5 UNICEF Data Warehouse
  • 51 Laying out multiple plots for Baseplot and ggplot
    • 51.1 Overview
    • 51.2 Most easy and normal form par()
    • 51.3 Complex plot layouts with layout()
    • 51.4 Layout for ggplot
      • 51.4.1 grid.arrange() in gridExtra
      • 51.4.2 method in grid
    • 51.5 Reference
  • 52 A basic Introduction to Markov Chain Monte Carlo Method in R
    • 52.1 1. Introduction
    • 52.2 2. Markov Chain Simulation Method
    • 52.3 3. Acceptance-Rejection Sampling
    • 52.4 4. Sampling from Markov Chain
    • 52.5 5. MCMC Sampling Method
    • 52.6 6. Metropolis-Hastings Sampling
    • 52.7 7. Implementation in R
      • 52.7.1 1. Sampling from an exponential distribution
      • 52.7.2 2. Sampling from a normal distribution
    • 52.8 8 Refenence
  • 53 automate eda with dataexplorer
    • 53.1 Overview
    • 53.2 DataExplorer
    • 53.3 Installation
    • 53.4 Exploratory data analysis (EDA)
      • 53.4.1 Overall Information
      • 53.4.2 Distribution
      • 53.4.3 Correlation Analysis
    • 53.5 Feature Engineering
      • 53.5.1 Missing value
      • 53.5.2 Dummy variable
    • 53.6 Data Report
    • 53.7 External Resources
  • 54 Speed up in r programming
    • 54.1 Typical development cycle for computational statistics
    • 54.2 Bytecode compilation
      • 54.2.1 Example: summing a vector
    • 54.3 Rcpp
      • 54.3.1 Use cppFunction
    • 54.4 Parallel computing
      • 54.4.1 Simulation example
    • 54.5 Package development
  • 55 Data Visualization in Python using different plotting packages
  • 56 Final project teammate finder
  • 57 Among Us Player Statistics
  • 58 Ipywidget example walkthrough video
  • 59 LinkedIn professional development session
  • IX Appendices
  • 60 Github Initial Setup
  • 61 Tutorial for Pull Request Mergers
    • 61.1 Check branch
    • 61.2 Examine files that were added or modified
    • 61.3 Check .Rmd filename
    • 61.4 Check .Rmd file contents
    • 61.5 Request changes
    • 61.6 Merge the pull request
      • 61.6.1 Add chapter filename to _bookdown.yml in PR’s branch
      • 61.6.2 Add chapter names to .Rmd for every first article in each chapter (Chapter Organization)
      • 61.6.3 Merge PR and leave a comment
  • 62 Chapter Organization
    • 62.1 Process for Organizing Chapters
    • 62.2 Initial Book Parts
      • 62.2.1 Motivation
      • 62.2.2 Additional Categories
  • Published with bookdown

Fall 2020 EDAV Community Contributions

Chapter 46 Translation of visualization related to geo data

Hanlin Tong and Wendy Qian

My partner and I wished to contribute to the community by translating the following paper into Chinese. This paper was published in 2019 by several scholars from Italy and it was aimed to encourage experts and non-experts to explore and to interpret the Spatio-Temporal ground deformation pattern in urban areas by the method it described. To fulfill this goal, two visualization applications were introduced. One was to visualize the mean deformation velocity map and the other one was to animate the cumulative deformation time series. Both visualizations were overlaid on a three-dimensional map and could be accessed via a free and open Web source (FOSS). My partner and I were inspired by this paper, not only because a new method was introduced, but also kept reminding us that data visualization could be and should be interpretable for non-experts and not for experts alone.

This paper has been a useful resource for us, and we hope its translation would be helpful for our classmates as well.

The original paper could be found on isprs website. Our translation can be found on GitHub repo.