• CC for EDAV 2019
  • 1 Instructions
    • 1.1 Background
    • 1.2 Preparing your .Rmd file
    • 1.3 Submission steps
    • 1.4 Optional tweaks
    • 1.5 FAQ
      • 1.5.1 What should I expect after creating a pull request?
      • 1.5.2 What if I catch mistakes after my pull request is merged?
      • 1.5.3 Other questions
  • 2 Sample project
  • I Working with data
  • 3 Basic R
    • 3.1 Data types
      • 3.1.1 (1) character
      • 3.1.2 (2)numeric
      • 3.1.3 (3)Logical
    • 3.2 data structure
      • 3.2.1 (1) vector:
      • 3.2.2 (2)list:
      • 3.2.3 (3)factor
      • 3.2.4 (4)matrix
      • 3.2.5 (5) dataframe
  • 4 Data structure and cleaning 101
    • 4.1 Overview
    • 4.2 Data Structure
      • 4.2.1 Basic Data Types
      • 4.2.2 Attributes
      • 4.2.3 Vector
      • 4.2.4 Matrix
      • 4.2.5 Array
      • 4.2.6 List
      • 4.2.7 Data Frame
      • 4.2.8 Data Structure Conversion
      • 4.2.9 Functions to Check Data Structure Attributes
    • 4.3 Data Cleaning
      • 4.3.1 Import Data
      • 4.3.2 Tidy Data
  • 5 All About Dataframes
    • 5.1 Create Data Frames
    • 5.2 Get information on the dataframe
    • 5.3 Concatenate dataframes
    • 5.4 Order dataframes
    • 5.5 Subset of data tables
    • 5.6 Change dataframe shape
    • 5.7 Transforming data
    • 5.8 Dealing with duplicates and missing values
    • 5.9 group_by function
  • 6 Dplyr Relational Databases
    • 6.1 1.Overview
    • 6.2 2.Definition of Relational Databases
    • 6.3 3. R Packages
    • 6.4 4. Data description for example
      • 6.4.1 4.1 BIS Library
      • 6.4.2 4.2 Selected data sets
    • 6.5 5. Types of joins
      • 6.5.1 5.1 Left_join
      • 6.5.2 5.2. Right_join
      • 6.5.3 5.3. Inner_join
      • 6.5.4 5.4. Full_join
  • 7 Web scraping using rvest
    • 7.1 1 Overview
    • 7.2 2 An Easy Example
    • 7.3 3 HTML Basics
      • 7.3.1 3.1 Access the source code
      • 7.3.2 3.2 HTML structures
    • 7.4 4 Rvest
      • 7.4.1 4.1 html_nodes and html_node
      • 7.4.2 4.2 css and xpath
    • 7.5 5 More Examples
      • 7.5.1 5.1 Scrape links using attributes
      • 7.5.2 5.2 Scrape Table
    • 7.6 6 External Resources
  • 8 Working with data links
    • 8.1 Categorical data cheatsheet
    • 8.2 Data wrangling with R cheatsheet:
    • 8.3 Date and Time Cheatsheet in R
    • 8.4 rvest cheatsheet
    • 8.5 tidyverse cheatsheet
    • 8.6 Python vs R (video)
    • 8.7 R package writing (workshop)
    • 8.8 Regex (workshop)
    • 8.9 GitHub help session (workshop)
  • II Static Graphs
  • 9 EDAV Flowchart
    • Distribution
    • Correlation
    • Comparison
    • Patterns
    • Statistical Values (ex. Median, Range)
    • Time Related
    • Survey Data (Likert Scale)
  • 10 Tufte’s Principles of Data-Ink
    • 10.1 Overview
    • 10.2 Minimal Line Plot
    • 10.3 Range-frame (or quartile-frame) scatterplot
    • 10.4 Dot-dash (or rug) scatterplot
    • 10.5 Marginal histogram scatterplot
    • 10.6 Minimal boxplot
    • 10.7 Minimal barchart
    • 10.8 Sparklines
    • 10.9 References and external resources
  • 11 Ridgeline plots
    • 11.1 Overview
    • 11.2 tl;dr
    • 11.3 Simple examples
    • 11.4 Theory
    • 11.5 External resources
  • 12 Gantt charts
    • 12.1 Using geom_line
    • 12.2 Using the package ‘plan’
  • 13 Plotrix for complex visualizations
    • 13.1 Overview
    • 13.2 Plotrix
      • 13.2.1 barNest example
    • 13.3 zoomInPlot example
    • 13.4 fan.plot example
    • 13.5 pie3D example
    • 13.6 pyramid.plot example
    • 13.7 Sources
  • 14 Stacked Bar Charts and Treemaps
    • 14.1 1. Grouped and Stacked Bar Chart
      • 14.1.1 Overview
      • 14.1.2 ggplot2
      • 14.1.3 plotly
      • 14.1.4 Consideration
      • 14.1.5 External resources
    • 14.2 2. Treemap
      • 14.2.1 Overview
      • 14.2.2 Continent level
      • 14.2.3 Region level
      • 14.2.4 Country Level
      • 14.2.5 Consideration
      • 14.2.6 External resources
  • 15 Fluctuation plots
  • 16 Introduction to package ‘ggparty’
    • 16.1 Introdunction of class ‘party’
    • 16.2 Use ‘ggparty’ to visualize the tree
    • 16.3 Customize the tree
    • 16.4 Add plots to the tree
    • 16.5 Application
      • 16.5.1 Categorical vs Numerical
      • 16.5.2 Numerical vs Numerical
  • 17 Multi-class hexbins
  • 18 Visualization in Time Series Analysis
    • 18.1 Initiate a Time series object:
    • 18.2 Plot the data:
    • 18.3 Transformation of nonstationary:
      • 18.3.1 Stationarity:
      • 18.3.2 Operations
    • 18.4 ACF and PACF for time series
    • 18.5 Full model: Yt = T(Trend) + S(Seasonality) +C(Cycle)
      • 18.5.1 Trend(T): Linear, Quadratic, etc. For normal linear model
      • 18.5.2 Seasonality(S):
      • 18.5.3 Cycle(C):
      • 18.5.4 Summary
      • 18.5.5 Reference:
  • 19 How to plot likert data
    • 19.1 Introduction
    • 19.2 Diverging stacked bar chart using function likert()
    • 19.3 Data cleaning and preparation
    • 19.4 Stacked bar chart using ggplot()
    • 19.5 Summary
  • 20 Chart: Stacked Bar Chart (For Likert Data)
    • 20.1 Overview
      • 20.1.1 Stacked Bar Chart
      • 20.1.2 Likert Data
    • 20.2 Examples
      • 20.2.1 Simple Stacked Bar Chart
      • 20.2.2 Likert Data with Stacked Bar Chart
    • 20.3 When to Use
    • 20.4 Considerations
      • 20.4.1 Interpretation of stacked bar charts:
      • 20.4.2 Alignings in Diverging Stacked Bar Charts:
    • 20.5 External Resources & References
  • 21 Likert
    • 21.1 Overview
    • 21.2 tl;dr
    • 21.3 Simple examples
      • 21.3.1 Stacked bar chart
      • 21.3.2 Diverging stacked bar chart
    • 21.4 Stacked bar chart using ggplot
    • 21.5 Theory
    • 21.6 When to use
    • 21.7 External resources
  • 22 Likert vs. Bar Chart
  • 23 Radar plots to show multivariate continuous data
  • 24 R vs tableau plots
    • 24.1 We shall now show our plots using R studio
    • 24.2 We shall now see how to do the same data visualization tasks using Tableau.
  • 25 GeomMLBStadiums
  • 26 ggmosaic
    • 26.1 Overview
    • 26.2 Introduction
    • 26.3 Order of splits
    • 26.4 Splitting on One Variable(binned data)
    • 26.5 Splitting on One Variable(unbinned data)
    • 26.6 Splitting on Two Variables
    • 26.7 Splitting on Three Variables
    • 26.8 Adjusting the Direction of Splits
    • 26.9 Alternative approach: Conditional
    • 26.10 Alternative approach: Facetting
    • 26.11 Comparison with vcd::mosaic
  • 27 Comparative Study of vcd::mosaic and geom_mosaic
    • 27.1 1. vcd::mosaic:
    • 27.2 2. geom_mosaic:
    • 27.3 3. vcd::mosaic vs geom_mosaic – which one is better?
  • 28 Latex Visualization
    • 28.0.1 Summary
  • 29 Cheat sheet of wordcloud2 package
  • 30 Wordcloud
    • 30.1 1. Introduction
    • 30.2 2. Demo of wordcloud2 Package
      • 30.2.1 2.0 Basic Wordcloud Graph
      • 30.2.2 2.1 Font Size
      • 30.2.3 2.2 Color and Background Color
      • 30.2.4 2.3 Shape
      • 30.2.5 2.4 Rotation
      • 30.2.6 2.5 Language
      • 30.2.7 2.6 Customized shape
  • 31 Visualizing Movie Reviews in Word Cloud
    • 31.1 IMDB Reviews
    • 31.2 Cleaning the data!
    • 31.3 Word Cloud
  • 32 Data art (talk)
  • III Interactive Graphs
  • 33 Shiny
    • 33.1 Part 1 How to Build a Shiny App
    • 33.2 1. Install the shiny package
    • 33.3 2. Template for creating a shiny app
    • 33.4 3. Add elements to user interface using fluidPage()
      • 33.4.1 Input functions
      • 33.4.2 Output functions
    • 33.5 4. Build output in server instructions
      • 33.5.1 (1): Save objects you want to display to output$
      • 33.5.2 (2): Build objects with render()
      • 33.5.3 (3): Use input values with input$
    • 33.6 5. Share your app
      • 33.6.1 Save your app
      • 33.6.2 Publish your app on Shinyapps.io
    • 33.7 Part 2 How to Customize Reactions
    • 33.8 1. Reactivity
      • 33.8.1 What is reactivity?
      • 33.8.2 Reactive values
      • 33.8.3 Reactive functions (reactive toolkit)
      • 33.8.4 Modularize code with reactive()
      • 33.8.5 Prevent reactions with isolate()
      • 33.8.6 Trigger code with observeEvent()
      • 33.8.7 Delay reactions with eventReactive()
      • 33.8.8 Manage state with reactiveValues()
    • 33.9 3. Summary
  • 34 HTML, JavaScript, and D3
  • 35 Technical Analysis for Stocks using Plotly
    • 35.1 Import all libraries
    • 35.2 Download data from Alpha Vantage
      • 35.2.1 Usefull links for more information:
    • 35.3 Simple plot: 2 traces in same axis
    • 35.4 Many traces in independent axis but in same plot
    • 35.5 Aesthetics: background and margins
    • 35.6 More aesthetics: hide legends and hide X-axis slider
    • 35.7 Shortcuts to slice data by pre-fixed date ranges
  • 36 GoogleVis
    • 36.1 Overview
    • 36.2 Example: Line chart
    • 36.3 Example: Geo Chart
    • 36.4 Example: Sankey chart
    • 36.5 googleVis in RStudio
    • 36.6 Reference and Resource
  • 37 Interactive graph links
    • 37.1 Bokeh Cheatsheet
    • 37.2 SandDance (video)
    • 37.3 OpenCPU (talk)
      • 37.3.1 What is OpenCPU?
      • 37.3.2 What is this Tutorial?
      • 37.3.3 Distogram: A Working OpenCPU Example
  • IV Spatial Analysis
  • 38 Stamen maps with ggmap
    • 38.1 Mutilayerd plots with ggmaps
    • 38.2 Getting Deeper
  • 39 Mapping in R
    • 39.1 Overview
    • 39.2 What is maps?
    • 39.3 Installing maps
    • 39.4 Simple Demonstration (using maps)
    • 39.5 Simple Demonstration (using ggplot2)
    • 39.6 Mapping with geom_map
    • 39.7 Considerations
    • 39.8 External Resources
  • 40 Plotting Maps with R: An Example-Based Tutorial
    • 40.1 Plotting using base R
    • 40.2 Plotting using ggplot2
    • 40.3 Plotting interactively using leaflet
    • 40.4 Plotting using tmap
  • 41 Different Ways of Plotting U.S. Map in R
    • 41.1 Introduction
    • 41.2 Using usmap package
    • 41.3 Using ggplot2 package
    • 41.4 Using maps package
    • 41.5 Using plotly package
    • 41.6 Using mapview package
    • 41.7 Using leaflet package
    • 41.8 Using tmap package
  • 42 Using Stamen Maps for Plotting Spatial Data
  • 43 World Heatmap in Plotly
    • 43.1 INTRODUCTION
    • 43.2 DEMONSTRATION
    • 43.3 CONCLUSION
    • 43.4 REFERENCES
  • 44 Spatial data links
    • 44.1 CartoDB (video)
    • 44.2 Leaflet
  • V Modeling
  • 45 Time Series Cheatsheet
  • 46 Tutorial for Multivariable Linear Regression
    • 46.1 Motivation
    • 46.2 Connection with Single Variable Regression
    • 46.3 Collinearity and Paradox
    • 46.4 Solution Path
    • 46.5 Stepwise Model Selection
    • 46.6 Model Verification
      • 46.6.1 Outliers and Leverage
  • 47 Keras Package Tutorial
    • 47.1 Installation
    • 47.2 Obtaining a Dataset
    • 47.3 Building a model
  • 48 Time Series Modeling with ARIMA in R
    • 48.1 1. Visualize the time series
    • 48.2 2. Stationarize the Time Series
    • 48.3 3. ACF/PACF
    • 48.4 4. Build the ARIMA Model
    • 48.5 5. Make Predictions
    • 48.6 References/Additional Resources
  • 49 Modeling links
    • 49.1 Exploring Financial Models
    • 49.2 Overview of the t-SNE algorithm
  • VI Communicating Results
  • 50 Rmarkdown tutorial
    • 50.1 1. Overview
      • 50.1.1 1.1 What is R Markdown?
      • 50.1.2 1.2 Workflow
    • 50.2 2. Getting started
      • 50.2.1 2.1. Install the package
      • 50.2.2 2.2. Open file
      • 50.2.3 2.3. output format
    • 50.3 3. Markdown syntax
    • 50.4 4. Embeding code
      • 50.4.1 4.1. Inline code
      • 50.4.2 4.2. Code chunks
      • 50.4.3 4.3. Display options
    • 50.5 5. Rendering
  • 51 Python in Rmarkdown
  • 52 RStudio vs JupyterLab (talk)
  • 53 bookdown (workshop)
  • VII Case studies
  • 54 The first step to analyse a dataset
    • 54.1 Introduction
    • 54.2 A glimpse at the dataset
      • 54.2.1 How does the data look like?
      • 54.2.2 Retrive the metadata
    • 54.3 Dive into one column
      • 54.3.1 Summarise a numerical variable
      • 54.3.2 Understand a categorical variable
    • 54.4 Advanced patterns about a data set
      • 54.4.1 Locate the missing values
      • 54.4.2 Find the outlier for numeric values
      • 54.4.3 Find out the correlations among variables
  • 55 Tinder self-reflection
    • 55.1 Introduction
      • 55.1.1 For The Taken / Non-Millennial Folk
      • 55.1.2 Replicating This Analysis For Yourself
      • 55.1.3 Protecting The Innocent (and Not-So-Innocent)
      • 55.1.4 A Fun Twist
    • 55.2 Analysis
      • 55.2.1 Our Fun New Tinder Statistics: “Amourmetrics”
      • 55.2.2 All-Time Statistics & A Demographical Discovery
      • 55.2.3 “It’s Like Batting Average, But For Tinder”
      • 55.2.4 Where & When Did My Swiping Habits Change?
      • 55.2.5 A Problem With Dates
      • 55.2.6 Overall Trends
      • 55.2.7 Playing Hard To Get
      • 55.2.8 Playing The Game
      • 55.2.9 “Swipe Night, Part 2”
      • 55.2.10 For My Fellow Data Nerds, Or People Who Just Like Graphs
    • 55.3 Conclusion
      • 55.3.1 Dubious Demographics
      • 55.3.2 Love Is Bored
      • 55.3.3 Does Location Matter? Well, Maybe.
      • 55.3.4 The Cinderella Effect
      • 55.3.5 “Playing Hard To Get” May A Be Real Thing
      • 55.3.6 Can We Solve Dating Using Machine Learning?
    • 55.4 Final Thoughts
  • 56 Ice Cream Survey
    • 56.1 Overview
      • 56.1.1 Description
      • 56.1.2 Goals of this community contribution
    • 56.2 Loading packages and reading in data
    • 56.3 Understanding what cleaning is required
    • 56.4 Cleaning and prepping the data
      • 56.4.1 Country
      • 56.4.2 Flavor
      • 56.4.3 Age
    • 56.5 Visualizing the data
      • 56.5.1 Getting an overview
      • 56.5.2 Ice cream preferences by continent and age
    • 56.6 Takeaways
  • 57 “Ask A Manager” salary survey dataset
    • 57.1 Obtaining the dataset
    • 57.2 Description of fields
    • 57.3 Data cleanup process
      • 57.3.1 Industry classification
      • 57.3.2 Job Title classification
      • 57.3.3 Contributing
  • 58 Forecast of the 2020 senate election
  • VIII Chinese translations
  • 59 Intro to stringr 包入门详解
    • 59.1 stringr 包的安装与调用
      • 59.1.1 安装
      • 59.1.2 调用
    • 59.2 字符串匹配函数(Detect Matches)
      • 59.2.1 str_detect(string, pattern)
      • 59.2.2 str_which(string, pattern)
      • 59.2.3 str_count(string, pattern)
      • 59.2.4 str_locate(string, pattern)
      • 59.2.5 str_locate_all(string, pattern)
    • 59.3 字符串的截取函数(Subset Strings)
      • 59.3.1 str_sub(string, start index, end index)
      • 59.3.2 str_subset(string,pattern)
      • 59.3.3 str_extract(string,pattern)
      • 59.3.4 str_match(string, pattern)
    • 59.4 字符串长度编辑函数(Manage Lengths)
      • 59.4.1 str_length(string)
      • 59.4.2 str_pad((string, width, side = c(“left”, “right”,“both”), pad = " ")
      • 59.4.3 str_trunc(string, width, side = c(“right”, “left”,“center”), ellipsis = “…”)
      • 59.4.4 str_trim(string, side = c(“both”, “left”, “right”))
    • 59.5 字符串变换与编辑函数(Mutate Strings)
      • 59.5.1 str_sub(string,start index,end index)
      • 59.5.2 str_replace(string,pattern,replacement)
      • 59.5.3 str_replace_all(string,pattern,replacement)
      • 59.5.4 str_to_lower(string)
      • 59.5.5 str_to_upper(string)
      • 59.5.6 str_to_title(string)
    • 59.6 字符串分割与拼接函数(Join and Split)
      • 59.6.1 str_c(…, sep = "", collapse = NULL)
      • 59.6.2 str_c(…, sep = "“, collapse =”")
      • 59.6.3 str_dup(string, times)
      • 59.6.4 str_split_fixed((string, pattern, n)
      • 59.6.5 str_glue(…, .sep = "", .envir = parent.frame())
      • 59.6.6 str_glue_data(.x, …, .sep = "“, .envir = parent.frame(), .na =”NA")
    • 59.7 字符串排序(Order Strings)
      • 59.7.1 str_sort(string)
      • 59.7.2 str_order(string)
    • 59.8 字符串的编译格式与显示格式修改函数(Encode and Visualize Strings)
      • 59.8.1 str_conv(string, encoding)
      • 59.8.2 str_view(string, pattern)
      • 59.8.3 str_wrap(string,width,indent,exdent)
    • 59.9 正则表达式(Regular Expression)
      • 59.9.1 字符匹配
      • 59.9.2 替换(Alternates)
      • 59.9.3 锚点(Anchors)
      • 59.9.4 查找(Look Arounds)
      • 59.9.5 数量词的使用(Quantifiers)
      • 59.9.6 括号划分表达式并用转义号码替换
    • 59.10 参考文献(Reference)
  • 60 Likert package
  • 61 rvest package 1
  • 62 rvest package 2
    • 62.0.1 Description:
    • 62.0.2 Source
    • 62.0.3 Cheatsheet
    • 62.0.4 Encoding(乱码处理)
    • 62.0.5 google_form
    • 62.0.6 HTML
    • 62.0.7 html_form (提取表单)
    • 62.0.8 html_nodes (提取网页中指定部分)
    • 62.0.9 html_session
    • 62.0.10 html_table (提取网页数据表)
    • 62.0.11 html_text
    • 62.0.12 jump_to (提取相对或绝对链接)
    • 62.0.13 pluck
    • 62.0.14 session_history
    • 62.0.15 set_values (修改表单)
    • 62.0.16 submit_form
  • 63 Translation of ‘parcoords’ Introduction
    • 63.1 1. ‘parcoords’包使用说明 - 中文翻译
      • 63.1.1 parcoords
      • 63.1.2 parcoords-shiny
      • 63.1.3 ParcoordsProxy
      • 63.1.4 pcCenter
      • 63.1.5 pcFilter
      • 63.1.6 pcHide
      • 63.1.7 pcSnapshot
      • 63.1.8 pcUnhide
    • 63.2 2. ‘parcoords’使用教程 - 中文翻译
      • 63.2.1 范例
      • 63.2.2 选项
      • 63.2.3 方法
  • 64 Chinese Translation of R Packages for Interactie Plots 交互式数据可视化包: plotly & parcoords
    • 64.1 R 交互式数据可视化包 ‘plotly’
    • 64.2 R 主题/函数目录:
    • 64.3 add_annotations
    • 64.4 add_data
    • 64.5 add_fun
    • 64.6 add_trace
    • 64.7 animation_opts
    • 64.8 colorbar
    • 64.9 embed_notebook
    • 64.10 ggplotly
    • 64.11 group2NA
    • 64.12 R 交互式数据可视化包 ‘parcoords’
    • 64.13 R 主题/函数目录:
    • 64.14 parcoords
    • 64.15 parcoords-shiny
    • 64.16 parcoordsProxy
    • 64.17 pcCenter
    • 64.18 parcoords_proxy
    • 64.19 pcFilter
    • 64.20 pcHide
    • 64.21 pcSnapshot
    • 64.22 pcUnhide
  • 65 Translation of Lattice Package
    • 65.1 Lattice 画图包的使用介绍
    • 65.2 例子引入
    • 65.3 主要思想
    • 65.4 设计目标
    • 65.5 常见的高级功能
      • 65.5.1 可视化单变量分布
      • 65.5.2 可视化表格
      • 65.5.3 通用功能和方法
      • 65.5.4 散点图和扩展
      • 65.5.5 瓦块数据
      • 65.5.6 三维显示
      • 65.5.7 网格(trellis)对象
    • 65.6 更多资源
      • 65.6.1 版本信息
  • 66 ggmosaic
    • 66.1 Chinese Translation: ‘ggmosaic’(马赛克图)
    • 66.2 引言
    • 66.3 简介
    • 66.4 分割的顺序
    • 66.5 根据一个变量分割(分箱数据):
    • 66.6 根据一个变量分割(非分箱数据):
    • 66.7 根据两个变量分割
    • 66.8 根据三个变量分割
    • 66.9 调整切割的方向
    • 66.10 另外一种方法:条件变量(Conditional)
    • 66.11 另外一种方法:块化(Facet)
    • 66.12 ‘ggmosaic’ vs vcd::‘mosaic’
  • 67 Chinese translation links
    • 67.1 R and ggplot2
    • 67.2 forcats package
      • 67.2.1 示范数据准备
      • 67.2.2 关于缺失数据(NAs)的处理
      • 67.2.3 同义因子水平
      • 67.2.4 混合多个频率低的因子水平成为一个
      • 67.2.5 在ggplot2 条形图中改变条的顺序
    • 67.3 Continuous variables with R (Chinese)
    • 67.4 Visualising Spatial Data
  • IX French translation
  • 68 edav.info
  • X Korean translations
  • 69 Heatmaps
    • 69.0.1 R Markdown
    • 69.0.2 개요
    • 69.0.3 tl;dr
    • 69.0.4 간단한 예제들
    • 69.0.5 2-차원 빈 카운트를 사용한 히트 맵
    • 69.0.6 데이터 프레임의 히트 맵
    • 69.0.7 수정
    • 69.0.8 이론
    • 69.0.9 추가 자료
  • 70 nullabor
    • 70.1 nullaobr 패키지 입문
      • 70.1.1 lineup 방법
      • 70.1.2 Rorschach 방법
      • 70.1.3 특정 분포를 가진 무수의 데이터 생성하기
      • 70.1.4 순열을 통한 무수의 데이터 생성하기
      • 70.1.5 모델에서의 무수 잔차를 이용해 무수의 데이터 생성하기
      • 70.1.6 nullabor 밖의 데이터 생성하기
      • 70.1.7 유의확률 계산하기
      • 70.1.8 검정력 계산하기
    • 70.2 nullbor의 lineup 예시
      • 70.2.1 선거 개찰
    • 70.3 무수(null) 와 데이터 포인츠들간의 거리계산
      • 70.3.1 소개
      • 70.3.2 거리 운율학
      • 70.3.3 단일변수 데이터에서의 거리
      • 70.3.4 회귀 매개변수들의 거리
      • 70.3.5 박스플랏에서의 거리
      • 70.3.6 구분된 상황에서의 거리
      • 70.3.7 구간화 거리
      • 70.3.8 정렬에서의 그래프들간의 평균 거리 계산
      • 70.3.9 여러가지의 정렬들의 차이 측정법
      • 70.3.10 최적의 구간화 수
      • 70.3.11 거리 운율법의 분포도
      • 70.3.12 거리 운율법의 경험적 분포도를 그리기
      • 70.3.13 참조
  • XI EDAV specific
  • 71 Hex Sticker
  • 72 Midsemester Review
    • 72.1 Lecuture 1: Introduction
    • 72.2 Lecture 2: Histograms
    • 72.3 Lecture 3: Grammar of Graphics
    • 72.4 Lecture 4: Common ggplot2 Problems
    • 72.5 Lecture 5: Boxplots & Continuous Variables
    • 72.6 Lecture 6: Rounding Normal (Continuous Variables Wrap-up)
    • 72.7 Lecture 7: Graphical Perception
    • 72.8 Lecture 8: Categorical Variables (Textbook: Chapter 04)
    • 72.9 Lecture 9: Web Scraping & rvest package
    • 72.10 Lecture 10: Scatterplots - 2 Continuous Variables (Textbook: Chapter 05)
    • 72.11 Lecture 11: Parallel Coordinates
    • 72.12 Lecture 12: Interactive Parallel Coordinates (Htmlwidget: parcoords)
    • 72.13 Lecture 13: Git - Workflow
    • 72.14 Lecture 14: Multivariate Categorical Variables (e.g. Mosaic Plots)
    • 72.15 Lecture 15: Transforming Data
    • 72.16 Lecture 16: Likert
    • 72.17 Lecture 17: Git - Branching
    • 72.18 Lecturee 18: Simpson’s Paradox
    • 72.19 Lecture 19: Heatmaps (Textbook: Chapter 8)
    • 72.20 Lecture 20: Time Series (Textbook: Chapter 11)
  • 73 List of Community Contribution
    • 73.0.1 * A lighting talk in class
    • 73.0.2 * A cheatsheet
    • 73.0.3 * A series of tutorials
    • 73.0.4 * A workshop - “ShareYouRWork”
  • Published with bookdown

Community contributions for EDAV Fall 2019

Chapter 49 Modeling links

49.1 Exploring Financial Models

Shreyas Jadhav(sj3006), Andrei Sipos(ags2202), Gideon Teitel(gt2288)

We made a presentation on exploration and visualization of Financial models and presented it in class on 28th October 2019.

The presentation involved analysis of 3 models namely:

  • Modern Portfolio Theory (Markowitz Model)
  • Capital Asset Pricing Model (CAPM)
  • Brownian Motion and Bates Model - Jump Diffusion

We have analysed and implemented the models in python and R.

Following are the links for the same:

LINKS:

Code for: Modern Portfolio Theory (Markowitz Model) and Capital Asset Pricing Model (CAPM) : https://github.com/shreyasj3006/Exploring-Financial-Models

Link for the entire presentation : https://drive.google.com/open?id=0B9K8qs96dJgzZGhkNHRvS051dmlDc2dwcWl6enNIQzdXcHhJ

Note: Please use your lionmail id to access this link.

Code for Simple Brownian Motion Simulation:

miu=0.01; sigma=0.03; T=1/12; n=1000; P0=100;
dt=T/n

t=seq(0,T,by=dt)
Price=c(P0,miu*dt+sigma*sqrt(dt)*rnorm(n,mean=0,sd=1))
Price=cumsum(Price)
plot(t,Price,type='l',ylab="Price P(t)",xlab="Time t",main = "Brownian Motion")

#install.packages("sde")

library(sde)

nt=10; 

t=seq(0,T,by=dt)
X=matrix(rep(0,length(t)*nt), nrow=nt)

for (i in 1:nt) {
  X[i,]= GBM(x=P0,r=miu,sigma=sigma,T=T,N=n)
  }


plot(t,X[1,],t='l',ylim=c(min(X), max(X)), col=1, ylab="Price P(t)",xlab="Time t",main = "Geometric Brownian Motion")

for(i in 2:nt){lines(t,X[i,], t='l',ylim=c(min(X), max(X)),col=i)}

#
#install.packages("ESGtoolkit")

library(ESGtoolkit)

eps0 <- simshocks(n = 10, horizon = 50, frequency = "quart")
sim.GBM <- simdiff(n = 10, horizon = 50, frequency = "quart", model = "GBM", P0, theta1 = 0.03, theta2 = 0.1, eps = eps0)
matplot(time(sim.GBM), sim.GBM, type = 'l', ylab="Price P(t)",xlab="Time t",main = "Bates Model - Jump Diffusion Simulation")

49.2 Overview of the t-SNE algorithm

Arjun Dhillon

For my community contribution, I’ll be delivering a 5-mintue lightning presentation on t-distributed stochastic neighbor embeddings (t-SNE). The t-SNE algorithm is useful in representing high-dimensional data in 2 or 3 dimensions. The process works by mapping higher dimensional data points to 2 or 3 dimesional datapoints in such a way that similar data points have higher probability of being near one another. It is especially useful for visualizing latent reprsentations of data in neural networks.

The main idea of t-SNE is to minimize the Kullback-Leibler divergence of the distribution \(Q\) of points in the remapped space from the distribution of points in \(P\) the original space:

\[KL(P||Q) = \sum_{i \ne j} p_{ij} \log \frac{p_{ij}}{q_{ij}}\],

where \(p_{ij}\) represents the similarity of two points in the original space:

\[p_{ij} = \frac{\exp( - \lVert x_i - x_j \rVert^2 / 2 \sigma_i^2)}{\sum_{k \ne i} \exp( - \lVert x_i - x_j \rVert^2 / 2 \sigma_i^2)}\] and \(q_{ij}\) represents the similarity of the mappings \(y_i\):

\[q_{ij} = \frac{(1 + \lVert y_i - y_j \rVert^2)^{-1}}{\sum_{k \ne i} (1 + \lVert y_i - y_j \rVert^2)^{-1}}\]