• CC for EDAV 2019
  • 1 Instructions
    • 1.1 Background
    • 1.2 Preparing your .Rmd file
    • 1.3 Submission steps
    • 1.4 Optional tweaks
    • 1.5 FAQ
      • 1.5.1 What should I expect after creating a pull request?
      • 1.5.2 What if I catch mistakes after my pull request is merged?
      • 1.5.3 Other questions
  • 2 Sample project
  • I Working with data
  • 3 Basic R
    • 3.1 Data types
      • 3.1.1 (1) character
      • 3.1.2 (2)numeric
      • 3.1.3 (3)Logical
    • 3.2 data structure
      • 3.2.1 (1) vector:
      • 3.2.2 (2)list:
      • 3.2.3 (3)factor
      • 3.2.4 (4)matrix
      • 3.2.5 (5) dataframe
  • 4 Data structure and cleaning 101
    • 4.1 Overview
    • 4.2 Data Structure
      • 4.2.1 Basic Data Types
      • 4.2.2 Attributes
      • 4.2.3 Vector
      • 4.2.4 Matrix
      • 4.2.5 Array
      • 4.2.6 List
      • 4.2.7 Data Frame
      • 4.2.8 Data Structure Conversion
      • 4.2.9 Functions to Check Data Structure Attributes
    • 4.3 Data Cleaning
      • 4.3.1 Import Data
      • 4.3.2 Tidy Data
  • 5 All About Dataframes
    • 5.1 Create Data Frames
    • 5.2 Get information on the dataframe
    • 5.3 Concatenate dataframes
    • 5.4 Order dataframes
    • 5.5 Subset of data tables
    • 5.6 Change dataframe shape
    • 5.7 Transforming data
    • 5.8 Dealing with duplicates and missing values
    • 5.9 group_by function
  • 6 Dplyr Relational Databases
    • 6.1 1.Overview
    • 6.2 2.Definition of Relational Databases
    • 6.3 3. R Packages
    • 6.4 4. Data description for example
      • 6.4.1 4.1 BIS Library
      • 6.4.2 4.2 Selected data sets
    • 6.5 5. Types of joins
      • 6.5.1 5.1 Left_join
      • 6.5.2 5.2. Right_join
      • 6.5.3 5.3. Inner_join
      • 6.5.4 5.4. Full_join
  • 7 Web scraping using rvest
    • 7.1 1 Overview
    • 7.2 2 An Easy Example
    • 7.3 3 HTML Basics
      • 7.3.1 3.1 Access the source code
      • 7.3.2 3.2 HTML structures
    • 7.4 4 Rvest
      • 7.4.1 4.1 html_nodes and html_node
      • 7.4.2 4.2 css and xpath
    • 7.5 5 More Examples
      • 7.5.1 5.1 Scrape links using attributes
      • 7.5.2 5.2 Scrape Table
    • 7.6 6 External Resources
  • 8 Working with data links
    • 8.1 Categorical data cheatsheet
    • 8.2 Data wrangling with R cheatsheet:
    • 8.3 Date and Time Cheatsheet in R
    • 8.4 rvest cheatsheet
    • 8.5 tidyverse cheatsheet
    • 8.6 Python vs R (video)
    • 8.7 R package writing (workshop)
    • 8.8 Regex (workshop)
    • 8.9 GitHub help session (workshop)
  • II Static Graphs
  • 9 EDAV Flowchart
    • Distribution
    • Correlation
    • Comparison
    • Patterns
    • Statistical Values (ex. Median, Range)
    • Time Related
    • Survey Data (Likert Scale)
  • 10 Tufte’s Principles of Data-Ink
    • 10.1 Overview
    • 10.2 Minimal Line Plot
    • 10.3 Range-frame (or quartile-frame) scatterplot
    • 10.4 Dot-dash (or rug) scatterplot
    • 10.5 Marginal histogram scatterplot
    • 10.6 Minimal boxplot
    • 10.7 Minimal barchart
    • 10.8 Sparklines
    • 10.9 References and external resources
  • 11 Ridgeline plots
    • 11.1 Overview
    • 11.2 tl;dr
    • 11.3 Simple examples
    • 11.4 Theory
    • 11.5 External resources
  • 12 Gantt charts
    • 12.1 Using geom_line
    • 12.2 Using the package ‘plan’
  • 13 Plotrix for complex visualizations
    • 13.1 Overview
    • 13.2 Plotrix
      • 13.2.1 barNest example
    • 13.3 zoomInPlot example
    • 13.4 fan.plot example
    • 13.5 pie3D example
    • 13.6 pyramid.plot example
    • 13.7 Sources
  • 14 Stacked Bar Charts and Treemaps
    • 14.1 1. Grouped and Stacked Bar Chart
      • 14.1.1 Overview
      • 14.1.2 ggplot2
      • 14.1.3 plotly
      • 14.1.4 Consideration
      • 14.1.5 External resources
    • 14.2 2. Treemap
      • 14.2.1 Overview
      • 14.2.2 Continent level
      • 14.2.3 Region level
      • 14.2.4 Country Level
      • 14.2.5 Consideration
      • 14.2.6 External resources
  • 15 Fluctuation plots
  • 16 Introduction to package ‘ggparty’
    • 16.1 Introdunction of class ‘party’
    • 16.2 Use ‘ggparty’ to visualize the tree
    • 16.3 Customize the tree
    • 16.4 Add plots to the tree
    • 16.5 Application
      • 16.5.1 Categorical vs Numerical
      • 16.5.2 Numerical vs Numerical
  • 17 Multi-class hexbins
  • 18 Visualization in Time Series Analysis
    • 18.1 Initiate a Time series object:
    • 18.2 Plot the data:
    • 18.3 Transformation of nonstationary:
      • 18.3.1 Stationarity:
      • 18.3.2 Operations
    • 18.4 ACF and PACF for time series
    • 18.5 Full model: Yt = T(Trend) + S(Seasonality) +C(Cycle)
      • 18.5.1 Trend(T): Linear, Quadratic, etc. For normal linear model
      • 18.5.2 Seasonality(S):
      • 18.5.3 Cycle(C):
      • 18.5.4 Summary
      • 18.5.5 Reference:
  • 19 How to plot likert data
    • 19.1 Introduction
    • 19.2 Diverging stacked bar chart using function likert()
    • 19.3 Data cleaning and preparation
    • 19.4 Stacked bar chart using ggplot()
    • 19.5 Summary
  • 20 Chart: Stacked Bar Chart (For Likert Data)
    • 20.1 Overview
      • 20.1.1 Stacked Bar Chart
      • 20.1.2 Likert Data
    • 20.2 Examples
      • 20.2.1 Simple Stacked Bar Chart
      • 20.2.2 Likert Data with Stacked Bar Chart
    • 20.3 When to Use
    • 20.4 Considerations
      • 20.4.1 Interpretation of stacked bar charts:
      • 20.4.2 Alignings in Diverging Stacked Bar Charts:
    • 20.5 External Resources & References
  • 21 Likert
    • 21.1 Overview
    • 21.2 tl;dr
    • 21.3 Simple examples
      • 21.3.1 Stacked bar chart
      • 21.3.2 Diverging stacked bar chart
    • 21.4 Stacked bar chart using ggplot
    • 21.5 Theory
    • 21.6 When to use
    • 21.7 External resources
  • 22 Likert vs. Bar Chart
  • 23 Radar plots to show multivariate continuous data
  • 24 R vs tableau plots
    • 24.1 We shall now show our plots using R studio
    • 24.2 We shall now see how to do the same data visualization tasks using Tableau.
  • 25 GeomMLBStadiums
  • 26 ggmosaic
    • 26.1 Overview
    • 26.2 Introduction
    • 26.3 Order of splits
    • 26.4 Splitting on One Variable(binned data)
    • 26.5 Splitting on One Variable(unbinned data)
    • 26.6 Splitting on Two Variables
    • 26.7 Splitting on Three Variables
    • 26.8 Adjusting the Direction of Splits
    • 26.9 Alternative approach: Conditional
    • 26.10 Alternative approach: Facetting
    • 26.11 Comparison with vcd::mosaic
  • 27 Comparative Study of vcd::mosaic and geom_mosaic
    • 27.1 1. vcd::mosaic:
    • 27.2 2. geom_mosaic:
    • 27.3 3. vcd::mosaic vs geom_mosaic – which one is better?
  • 28 Latex Visualization
    • 28.0.1 Summary
  • 29 Cheat sheet of wordcloud2 package
  • 30 Wordcloud
    • 30.1 1. Introduction
    • 30.2 2. Demo of wordcloud2 Package
      • 30.2.1 2.0 Basic Wordcloud Graph
      • 30.2.2 2.1 Font Size
      • 30.2.3 2.2 Color and Background Color
      • 30.2.4 2.3 Shape
      • 30.2.5 2.4 Rotation
      • 30.2.6 2.5 Language
      • 30.2.7 2.6 Customized shape
  • 31 Visualizing Movie Reviews in Word Cloud
    • 31.1 IMDB Reviews
    • 31.2 Cleaning the data!
    • 31.3 Word Cloud
  • 32 Data art (talk)
  • III Interactive Graphs
  • 33 Shiny
    • 33.1 Part 1 How to Build a Shiny App
    • 33.2 1. Install the shiny package
    • 33.3 2. Template for creating a shiny app
    • 33.4 3. Add elements to user interface using fluidPage()
      • 33.4.1 Input functions
      • 33.4.2 Output functions
    • 33.5 4. Build output in server instructions
      • 33.5.1 (1): Save objects you want to display to output$
      • 33.5.2 (2): Build objects with render()
      • 33.5.3 (3): Use input values with input$
    • 33.6 5. Share your app
      • 33.6.1 Save your app
      • 33.6.2 Publish your app on Shinyapps.io
    • 33.7 Part 2 How to Customize Reactions
    • 33.8 1. Reactivity
      • 33.8.1 What is reactivity?
      • 33.8.2 Reactive values
      • 33.8.3 Reactive functions (reactive toolkit)
      • 33.8.4 Modularize code with reactive()
      • 33.8.5 Prevent reactions with isolate()
      • 33.8.6 Trigger code with observeEvent()
      • 33.8.7 Delay reactions with eventReactive()
      • 33.8.8 Manage state with reactiveValues()
    • 33.9 3. Summary
  • 34 HTML, JavaScript, and D3
  • 35 Technical Analysis for Stocks using Plotly
    • 35.1 Import all libraries
    • 35.2 Download data from Alpha Vantage
      • 35.2.1 Usefull links for more information:
    • 35.3 Simple plot: 2 traces in same axis
    • 35.4 Many traces in independent axis but in same plot
    • 35.5 Aesthetics: background and margins
    • 35.6 More aesthetics: hide legends and hide X-axis slider
    • 35.7 Shortcuts to slice data by pre-fixed date ranges
  • 36 GoogleVis
    • 36.1 Overview
    • 36.2 Example: Line chart
    • 36.3 Example: Geo Chart
    • 36.4 Example: Sankey chart
    • 36.5 googleVis in RStudio
    • 36.6 Reference and Resource
  • 37 Interactive graph links
    • 37.1 Bokeh Cheatsheet
    • 37.2 SandDance (video)
    • 37.3 OpenCPU (talk)
      • 37.3.1 What is OpenCPU?
      • 37.3.2 What is this Tutorial?
      • 37.3.3 Distogram: A Working OpenCPU Example
  • IV Spatial Analysis
  • 38 Stamen maps with ggmap
    • 38.1 Mutilayerd plots with ggmaps
    • 38.2 Getting Deeper
  • 39 Mapping in R
    • 39.1 Overview
    • 39.2 What is maps?
    • 39.3 Installing maps
    • 39.4 Simple Demonstration (using maps)
    • 39.5 Simple Demonstration (using ggplot2)
    • 39.6 Mapping with geom_map
    • 39.7 Considerations
    • 39.8 External Resources
  • 40 Plotting Maps with R: An Example-Based Tutorial
    • 40.1 Plotting using base R
    • 40.2 Plotting using ggplot2
    • 40.3 Plotting interactively using leaflet
    • 40.4 Plotting using tmap
  • 41 Different Ways of Plotting U.S. Map in R
    • 41.1 Introduction
    • 41.2 Using usmap package
    • 41.3 Using ggplot2 package
    • 41.4 Using maps package
    • 41.5 Using plotly package
    • 41.6 Using mapview package
    • 41.7 Using leaflet package
    • 41.8 Using tmap package
  • 42 Using Stamen Maps for Plotting Spatial Data
  • 43 World Heatmap in Plotly
    • 43.1 INTRODUCTION
    • 43.2 DEMONSTRATION
    • 43.3 CONCLUSION
    • 43.4 REFERENCES
  • 44 Spatial data links
    • 44.1 CartoDB (video)
    • 44.2 Leaflet
  • V Modeling
  • 45 Time Series Cheatsheet
  • 46 Tutorial for Multivariable Linear Regression
    • 46.1 Motivation
    • 46.2 Connection with Single Variable Regression
    • 46.3 Collinearity and Paradox
    • 46.4 Solution Path
    • 46.5 Stepwise Model Selection
    • 46.6 Model Verification
      • 46.6.1 Outliers and Leverage
  • 47 Keras Package Tutorial
    • 47.1 Installation
    • 47.2 Obtaining a Dataset
    • 47.3 Building a model
  • 48 Time Series Modeling with ARIMA in R
    • 48.1 1. Visualize the time series
    • 48.2 2. Stationarize the Time Series
    • 48.3 3. ACF/PACF
    • 48.4 4. Build the ARIMA Model
    • 48.5 5. Make Predictions
    • 48.6 References/Additional Resources
  • 49 Modeling links
    • 49.1 Exploring Financial Models
    • 49.2 Overview of the t-SNE algorithm
  • VI Communicating Results
  • 50 Rmarkdown tutorial
    • 50.1 1. Overview
      • 50.1.1 1.1 What is R Markdown?
      • 50.1.2 1.2 Workflow
    • 50.2 2. Getting started
      • 50.2.1 2.1. Install the package
      • 50.2.2 2.2. Open file
      • 50.2.3 2.3. output format
    • 50.3 3. Markdown syntax
    • 50.4 4. Embeding code
      • 50.4.1 4.1. Inline code
      • 50.4.2 4.2. Code chunks
      • 50.4.3 4.3. Display options
    • 50.5 5. Rendering
  • 51 Python in Rmarkdown
  • 52 RStudio vs JupyterLab (talk)
  • 53 bookdown (workshop)
  • VII Case studies
  • 54 The first step to analyse a dataset
    • 54.1 Introduction
    • 54.2 A glimpse at the dataset
      • 54.2.1 How does the data look like?
      • 54.2.2 Retrive the metadata
    • 54.3 Dive into one column
      • 54.3.1 Summarise a numerical variable
      • 54.3.2 Understand a categorical variable
    • 54.4 Advanced patterns about a data set
      • 54.4.1 Locate the missing values
      • 54.4.2 Find the outlier for numeric values
      • 54.4.3 Find out the correlations among variables
  • 55 Tinder self-reflection
    • 55.1 Introduction
      • 55.1.1 For The Taken / Non-Millennial Folk
      • 55.1.2 Replicating This Analysis For Yourself
      • 55.1.3 Protecting The Innocent (and Not-So-Innocent)
      • 55.1.4 A Fun Twist
    • 55.2 Analysis
      • 55.2.1 Our Fun New Tinder Statistics: “Amourmetrics”
      • 55.2.2 All-Time Statistics & A Demographical Discovery
      • 55.2.3 “It’s Like Batting Average, But For Tinder”
      • 55.2.4 Where & When Did My Swiping Habits Change?
      • 55.2.5 A Problem With Dates
      • 55.2.6 Overall Trends
      • 55.2.7 Playing Hard To Get
      • 55.2.8 Playing The Game
      • 55.2.9 “Swipe Night, Part 2”
      • 55.2.10 For My Fellow Data Nerds, Or People Who Just Like Graphs
    • 55.3 Conclusion
      • 55.3.1 Dubious Demographics
      • 55.3.2 Love Is Bored
      • 55.3.3 Does Location Matter? Well, Maybe.
      • 55.3.4 The Cinderella Effect
      • 55.3.5 “Playing Hard To Get” May A Be Real Thing
      • 55.3.6 Can We Solve Dating Using Machine Learning?
    • 55.4 Final Thoughts
  • 56 Ice Cream Survey
    • 56.1 Overview
      • 56.1.1 Description
      • 56.1.2 Goals of this community contribution
    • 56.2 Loading packages and reading in data
    • 56.3 Understanding what cleaning is required
    • 56.4 Cleaning and prepping the data
      • 56.4.1 Country
      • 56.4.2 Flavor
      • 56.4.3 Age
    • 56.5 Visualizing the data
      • 56.5.1 Getting an overview
      • 56.5.2 Ice cream preferences by continent and age
    • 56.6 Takeaways
  • 57 “Ask A Manager” salary survey dataset
    • 57.1 Obtaining the dataset
    • 57.2 Description of fields
    • 57.3 Data cleanup process
      • 57.3.1 Industry classification
      • 57.3.2 Job Title classification
      • 57.3.3 Contributing
  • 58 Forecast of the 2020 senate election
  • VIII Chinese translations
  • 59 Intro to stringr 包入门详解
    • 59.1 stringr 包的安装与调用
      • 59.1.1 安装
      • 59.1.2 调用
    • 59.2 字符串匹配函数(Detect Matches)
      • 59.2.1 str_detect(string, pattern)
      • 59.2.2 str_which(string, pattern)
      • 59.2.3 str_count(string, pattern)
      • 59.2.4 str_locate(string, pattern)
      • 59.2.5 str_locate_all(string, pattern)
    • 59.3 字符串的截取函数(Subset Strings)
      • 59.3.1 str_sub(string, start index, end index)
      • 59.3.2 str_subset(string,pattern)
      • 59.3.3 str_extract(string,pattern)
      • 59.3.4 str_match(string, pattern)
    • 59.4 字符串长度编辑函数(Manage Lengths)
      • 59.4.1 str_length(string)
      • 59.4.2 str_pad((string, width, side = c(“left”, “right”,“both”), pad = " ")
      • 59.4.3 str_trunc(string, width, side = c(“right”, “left”,“center”), ellipsis = “…”)
      • 59.4.4 str_trim(string, side = c(“both”, “left”, “right”))
    • 59.5 字符串变换与编辑函数(Mutate Strings)
      • 59.5.1 str_sub(string,start index,end index)
      • 59.5.2 str_replace(string,pattern,replacement)
      • 59.5.3 str_replace_all(string,pattern,replacement)
      • 59.5.4 str_to_lower(string)
      • 59.5.5 str_to_upper(string)
      • 59.5.6 str_to_title(string)
    • 59.6 字符串分割与拼接函数(Join and Split)
      • 59.6.1 str_c(…, sep = "", collapse = NULL)
      • 59.6.2 str_c(…, sep = "“, collapse =”")
      • 59.6.3 str_dup(string, times)
      • 59.6.4 str_split_fixed((string, pattern, n)
      • 59.6.5 str_glue(…, .sep = "", .envir = parent.frame())
      • 59.6.6 str_glue_data(.x, …, .sep = "“, .envir = parent.frame(), .na =”NA")
    • 59.7 字符串排序(Order Strings)
      • 59.7.1 str_sort(string)
      • 59.7.2 str_order(string)
    • 59.8 字符串的编译格式与显示格式修改函数(Encode and Visualize Strings)
      • 59.8.1 str_conv(string, encoding)
      • 59.8.2 str_view(string, pattern)
      • 59.8.3 str_wrap(string,width,indent,exdent)
    • 59.9 正则表达式(Regular Expression)
      • 59.9.1 字符匹配
      • 59.9.2 替换(Alternates)
      • 59.9.3 锚点(Anchors)
      • 59.9.4 查找(Look Arounds)
      • 59.9.5 数量词的使用(Quantifiers)
      • 59.9.6 括号划分表达式并用转义号码替换
    • 59.10 参考文献(Reference)
  • 60 Likert package
  • 61 rvest package 1
  • 62 rvest package 2
    • 62.0.1 Description:
    • 62.0.2 Source
    • 62.0.3 Cheatsheet
    • 62.0.4 Encoding(乱码处理)
    • 62.0.5 google_form
    • 62.0.6 HTML
    • 62.0.7 html_form (提取表单)
    • 62.0.8 html_nodes (提取网页中指定部分)
    • 62.0.9 html_session
    • 62.0.10 html_table (提取网页数据表)
    • 62.0.11 html_text
    • 62.0.12 jump_to (提取相对或绝对链接)
    • 62.0.13 pluck
    • 62.0.14 session_history
    • 62.0.15 set_values (修改表单)
    • 62.0.16 submit_form
  • 63 Translation of ‘parcoords’ Introduction
    • 63.1 1. ‘parcoords’包使用说明 - 中文翻译
      • 63.1.1 parcoords
      • 63.1.2 parcoords-shiny
      • 63.1.3 ParcoordsProxy
      • 63.1.4 pcCenter
      • 63.1.5 pcFilter
      • 63.1.6 pcHide
      • 63.1.7 pcSnapshot
      • 63.1.8 pcUnhide
    • 63.2 2. ‘parcoords’使用教程 - 中文翻译
      • 63.2.1 范例
      • 63.2.2 选项
      • 63.2.3 方法
  • 64 Chinese Translation of R Packages for Interactie Plots 交互式数据可视化包: plotly & parcoords
    • 64.1 R 交互式数据可视化包 ‘plotly’
    • 64.2 R 主题/函数目录:
    • 64.3 add_annotations
    • 64.4 add_data
    • 64.5 add_fun
    • 64.6 add_trace
    • 64.7 animation_opts
    • 64.8 colorbar
    • 64.9 embed_notebook
    • 64.10 ggplotly
    • 64.11 group2NA
    • 64.12 R 交互式数据可视化包 ‘parcoords’
    • 64.13 R 主题/函数目录:
    • 64.14 parcoords
    • 64.15 parcoords-shiny
    • 64.16 parcoordsProxy
    • 64.17 pcCenter
    • 64.18 parcoords_proxy
    • 64.19 pcFilter
    • 64.20 pcHide
    • 64.21 pcSnapshot
    • 64.22 pcUnhide
  • 65 Translation of Lattice Package
    • 65.1 Lattice 画图包的使用介绍
    • 65.2 例子引入
    • 65.3 主要思想
    • 65.4 设计目标
    • 65.5 常见的高级功能
      • 65.5.1 可视化单变量分布
      • 65.5.2 可视化表格
      • 65.5.3 通用功能和方法
      • 65.5.4 散点图和扩展
      • 65.5.5 瓦块数据
      • 65.5.6 三维显示
      • 65.5.7 网格(trellis)对象
    • 65.6 更多资源
      • 65.6.1 版本信息
  • 66 ggmosaic
    • 66.1 Chinese Translation: ‘ggmosaic’(马赛克图)
    • 66.2 引言
    • 66.3 简介
    • 66.4 分割的顺序
    • 66.5 根据一个变量分割(分箱数据):
    • 66.6 根据一个变量分割(非分箱数据):
    • 66.7 根据两个变量分割
    • 66.8 根据三个变量分割
    • 66.9 调整切割的方向
    • 66.10 另外一种方法:条件变量(Conditional)
    • 66.11 另外一种方法:块化(Facet)
    • 66.12 ‘ggmosaic’ vs vcd::‘mosaic’
  • 67 Chinese translation links
    • 67.1 R and ggplot2
    • 67.2 forcats package
      • 67.2.1 示范数据准备
      • 67.2.2 关于缺失数据(NAs)的处理
      • 67.2.3 同义因子水平
      • 67.2.4 混合多个频率低的因子水平成为一个
      • 67.2.5 在ggplot2 条形图中改变条的顺序
    • 67.3 Continuous variables with R (Chinese)
    • 67.4 Visualising Spatial Data
  • IX French translation
  • 68 edav.info
  • X Korean translations
  • 69 Heatmaps
    • 69.0.1 R Markdown
    • 69.0.2 개요
    • 69.0.3 tl;dr
    • 69.0.4 간단한 예제들
    • 69.0.5 2-차원 빈 카운트를 사용한 히트 맵
    • 69.0.6 데이터 프레임의 히트 맵
    • 69.0.7 수정
    • 69.0.8 이론
    • 69.0.9 추가 자료
  • 70 nullabor
    • 70.1 nullaobr 패키지 입문
      • 70.1.1 lineup 방법
      • 70.1.2 Rorschach 방법
      • 70.1.3 특정 분포를 가진 무수의 데이터 생성하기
      • 70.1.4 순열을 통한 무수의 데이터 생성하기
      • 70.1.5 모델에서의 무수 잔차를 이용해 무수의 데이터 생성하기
      • 70.1.6 nullabor 밖의 데이터 생성하기
      • 70.1.7 유의확률 계산하기
      • 70.1.8 검정력 계산하기
    • 70.2 nullbor의 lineup 예시
      • 70.2.1 선거 개찰
    • 70.3 무수(null) 와 데이터 포인츠들간의 거리계산
      • 70.3.1 소개
      • 70.3.2 거리 운율학
      • 70.3.3 단일변수 데이터에서의 거리
      • 70.3.4 회귀 매개변수들의 거리
      • 70.3.5 박스플랏에서의 거리
      • 70.3.6 구분된 상황에서의 거리
      • 70.3.7 구간화 거리
      • 70.3.8 정렬에서의 그래프들간의 평균 거리 계산
      • 70.3.9 여러가지의 정렬들의 차이 측정법
      • 70.3.10 최적의 구간화 수
      • 70.3.11 거리 운율법의 분포도
      • 70.3.12 거리 운율법의 경험적 분포도를 그리기
      • 70.3.13 참조
  • XI EDAV specific
  • 71 Hex Sticker
  • 72 Midsemester Review
    • 72.1 Lecuture 1: Introduction
    • 72.2 Lecture 2: Histograms
    • 72.3 Lecture 3: Grammar of Graphics
    • 72.4 Lecture 4: Common ggplot2 Problems
    • 72.5 Lecture 5: Boxplots & Continuous Variables
    • 72.6 Lecture 6: Rounding Normal (Continuous Variables Wrap-up)
    • 72.7 Lecture 7: Graphical Perception
    • 72.8 Lecture 8: Categorical Variables (Textbook: Chapter 04)
    • 72.9 Lecture 9: Web Scraping & rvest package
    • 72.10 Lecture 10: Scatterplots - 2 Continuous Variables (Textbook: Chapter 05)
    • 72.11 Lecture 11: Parallel Coordinates
    • 72.12 Lecture 12: Interactive Parallel Coordinates (Htmlwidget: parcoords)
    • 72.13 Lecture 13: Git - Workflow
    • 72.14 Lecture 14: Multivariate Categorical Variables (e.g. Mosaic Plots)
    • 72.15 Lecture 15: Transforming Data
    • 72.16 Lecture 16: Likert
    • 72.17 Lecture 17: Git - Branching
    • 72.18 Lecturee 18: Simpson’s Paradox
    • 72.19 Lecture 19: Heatmaps (Textbook: Chapter 8)
    • 72.20 Lecture 20: Time Series (Textbook: Chapter 11)
  • 73 List of Community Contribution
    • 73.0.1 * A lighting talk in class
    • 73.0.2 * A cheatsheet
    • 73.0.3 * A series of tutorials
    • 73.0.4 * A workshop - “ShareYouRWork”
  • Published with bookdown

Community contributions for EDAV Fall 2019

Chapter 8 Working with data links

8.1 Categorical data cheatsheet

Zhi Qi

This chapter provides a cheatsheet that helps you tackle categorical data. It breaks down what form of graph to use by the type and the number of variables. More information available on edav.info/

Cheatsheet: https://github.com/michaelqizhi/Categorical-data-cheatsheet/blob/master/Categorical%20Variables%20Cheatsheet.pdf

8.2 Data wrangling with R cheatsheet:

Tabitha K. Sugumar

Goal: Reference sheet for commonly used functions when doing basic data wrangling and cleaning tasks. Consolidate this information in an easy to find place in order to double check syntax, which version of a function to use, etc.

Functions included:

Subsetting: filter, select Applying: apply, lapply, sapply, vapply Reshaping: cbind, rbind, merge, melt, cast, gather, spread, arrage Aggregation: groupby, summarize, aggregate Manipulation: mutate, transmute, mutate_if, mutate_at, mutate_all, trimws, substr, make_clean_names, na.omit, is.na, remove_empty

Link: https://github.com/tks19/EDAV/blob/master/DataWranglingwithRCheatsheet.pdf

8.3 Date and Time Cheatsheet in R

Kanika Aggarwal and Swarna Bharathi Mantena

We both prepared a cheatsheet for the Date and Time manipulations in R programming language. Mentioned below is the github link and the references used. NOTE: We uploaded the cheatsheet as a pdf file in the github repository.

Github Link:

https://github.com/SwarnaBharathiMantena/EDAV_CommunityContribution

References:

http://ianmadd.github.io/pages/POSIXct_and_POSIXlt.html

https://www.cyclismo.org/tutorial/R/time.html

https://stackoverflow.com/questions/7561400/strptime-function-in-r-to-manipulate-date-time-class

https://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html

https://www.stat.berkeley.edu/~s133/dates.html

https://readr.tidyverse.org/reference/parse_datetime.html

https://www.stat.berkeley.edu/~s133/dates.html

https://stat.ethz.ch/R-manual/R-devel/library/base/html/Sys.time.html

https://learnr.wordpress.com/2010/02/25/ggplot2-plotting-dates-hours-and-minutes/

8.4 rvest cheatsheet

Huayun Xu and Zelin Li

This project creates a cheatsheet on rvest package in R.

link: https://github.com/MXKLZL/rvest-cheatsheet/blob/master/contribution.pdf

8.5 tidyverse cheatsheet

Huimin Jiang and Yiming Huang

Caption for the picture.

Caption for the picture.

8.6 Python vs R (video)

Nima Chitsazan and Foad Khoshouei

We created a video describing visualization in python and R. For python we specifically focused on matplotlib library and compared it to ggplot library of R.

The video is available on youtube: https://youtu.be/phVKWXaAStY

The slides are available: https://github.com/fk2377/EDAVCC

8.7 R package writing (workshop)

Siddhant Shandilya and Mohit Chander Gulla

R packages are an ideal way to package and distribute R code and data for re-use by others.

This workshop will provide you with an overview of how to create your own pacakge in R.

The walkthrough gives step by step instructions on how to define your functions, create a project for your package, embed your functions and its documentation within it and finally how to compile and build it into an R package that is ready to be shared or published.

All the materials used in the workshop can be found at: https://github.com/siddhantshandilya/EDAV---Community-Contribution-19

You may refer to the reference links provided at the end of the pdf which goes into further details on how to publish your package on CRAN repository, if you are interested.

8.8 Regex (workshop)

Author: Cheng Yan, Chao Huang

This workshop may offer you a basic understanding of regular expression and how it can be used to solve various problems. The workshop is divided into four sections, from the definition of regular expression to the application of regular expression in EDAV. This page is just a roadmap of the workshop and you can find more details and examples in the slides of the workshop here.

Definition of Regular Expression

In this section, we introduce the definition of regular expression and typical scenarios where we can apply this powerful tool.

Basic syntax of Regular Expression

In this section, we introduce the basic syntax of regular expression, including wildcards, set, meta-characters, repeated matches, position matching and etc. These patterns can help solve most of the string manipulation problems we meet in our daily work.

Advanced syntax of Regular Expression

Besides basic syntax, we also introduce some more advanced techniques, including group capturing and looking around. With these tools, you can construct more complex and also more powerful regular expressions.

Application in EDAV

By using regular expression to solve a string wrapping problem given at PSet2 in one line of code, we show how to compile and use regular expression in R.

For those who want to know more about regular expression, two books are highly recommended, namely “Sams Teach Yourself Regular Expressions in 10 Minutes” and “Mastering Regular Expressions”.

8.9 GitHub help session (workshop)

Karthik Rajaraman Iyer and Akshay Pakhle

We held a walk in doubt session for peers , helping them resolve issues with the github workflow. While some had trouble setting up environment variables in GitBash, other common clarifications were about the brached workflow, creation/handing of pull requests and the appropriate use of issues.

We also helped in designing a practice assignment to get a grasp over the github workflow.