Recently Updated Pages
Heterogeneous Graph Learning
Knowledge graphs are visualization of information with multi-type relations (edges) among some mu...
Propensity Score Weighting Analysis
Unlike randomized clinical trials, observational studies must adjust for differences such as conf...
Survival Analysis I
Survival analysis is a measure of time until an event occurs. It doesn't only measure death as an...
Advanced Machine Learning
Recall that in an ordinary multiple linear regression, we have a set of p predictor variables mea...
Forecasting with Geospacial Data
Geo-statistics is a subfield of statistics focused on spatial or spatiotemporal datasets, AKA dat...
Sampling
In the practical use of statistics, we don't have an infinite amount of data. An enormous amount ...
Scala
I'll start with a disclaimer: These are notes written by an experienced Java dev, thus some le...
DataFrames and Advanced Techniques
A Spark DataSet is an extension of the RDD object. It has rows, can run queries, and has a schema...
Intro to Spark / RDDs
Apache Spark Spark is a fast and general engine for large-scale data processing. The user writes...
File Structure and Linked Views
After adding a lot of different event listeners, the JavaScript file can become messy. This secti...
Data Driven Documents
Introduction D3 is a JS library which can be used to create charts and visualiztions, but to call...
Layouts and Structured Data
Now that I've covered the basics of programming in D3, let's take a look at some of the other coo...
Making Graphs
Scales Scales are functions that map from an input domain to an output range Linear Scales Lin...
Dynamic and Interactive Content
Thus far we've looked at building static content, but the backbone of D3.js are it's beautiful tr...
The Basics of Design
Data visualizations should be easy to interpret and look credible. To do this there are several f...
GLM for Correlated Data
So far the models we've covered assume independence between observations collected on separate in...
Survival - Time to Failure
Analysis of survival data is more complex than than other methods we've seen so far; We can't jus...
Time Series Models
While standard regression we must assume observations are independent from one another, but with ...
Gamma Regression
Consider a continuous dependent variable that is positive-valued, such as a length of a hospital ...
Correlated Data in Clincal Trials
Note: My BS857 Notebook on Correlated Data goes much further in depth than the below. So far we ...