Advanced Search
Search Results
165 total results found
GLM for Multinomial Outcomes
Multinomial outcomes are much akin to binomial outcomes, with added complexity due to outcomes with more than 2 levels. In such cases it can be difficult to determine an 'order' to the outcomes. Log-linear models can be used for analysis of this type of data....
Multi-Level Modeling
Recall the core of mixed models is that they incorporate fixed and random effects. While single level models assume one variance, subjects within the same level are correlated in terms of σ0j2 + σ1j2Xij + εij Where y is an N*1 column vector of the outcomeX i...
Non-Inferiority in Clinical Trials
Usually clinical trials should show if a new treatment is superior to placebo or no treatment, but as we've previously discussed it is not always ethical to give out a placebo when an effective treatment has been identified. The goal of non-inferiority trials...
Effect Modification and Interaction
Interaction is when a treatment effect is different across different subgroups of the population defined by a baseline covariate. Commonly examined effect modifiers include: demographic variables, study location, or baseline prognostic factors. If an interact...
Multiple Imputation
If no missing data is present our statistical methods provide valid inference only if the following assumptions are met: For Generalized Estimating Equations, the mean function is correctly specified For likelihood-based methods, the probability density fu...
Mutlivariate and Joint Models for Longitudinal Data
Longitudinal studies are commonly designed in many research fields in order to see changes over a time interval shared by all participants. Joint modeling consists of two interlinked sub-models with any type of outcome (continuous, binomial, etc). One of the m...
Interim Analysis and Data Monitoring
Clinical trials are often longitudinal in nature. It is often impossible to enroll all subjects at the same time, so it can take a long time to complete a longitudinal study. Over the course of the trial one needs to consider administrative monitoring, safety ...
GLM for Count Data
Generalized linear models for count data are regression techniques available for modeling outcomes describing a type of discrete data where the occurrence might be relatively rare. A common distribution for such a random variable is Poisson. The probability t...
Time Series Models
While standard regression we must assume observations are independent from one another, but with time series data we expect that neighboring observations are correlated. Time series analysis helps organizations understand the underlying causes of trends or sys...
Correlated Data in Clincal Trials
Note: My BS857 Notebook on Correlated Data goes much further in depth than the below. So far we have focused on independent outcomes in clinical trials, but often times we work with correlated or non-independent outcomes in clinical trials; Such as crossover ...
Gamma Regression
Consider a continuous dependent variable that is positive-valued, such as a length of a hospital stay, time waiting or the cost of a bill. This type of data is continuous in nature and oftentimes skewed and a normal approximation does not hold. The type of da...
Survival - Time to Failure
Analysis of survival data is more complex than than other methods we've seen so far; We can't just take the mean survival time a a confidence interval to predict when the last patient will die. Also, survival times are unlikely to follow a Normal distribution,...
GLM for Correlated Data
So far the models we've covered assume independence between observations collected on separate individuals. When observation are correlated models that incorporate the existing correlation in the data should be employed. There are many approaches are proposed ...
The Basics of Design
Data visualizations should be easy to interpret and look credible. To do this there are several factors that be kept in focus, called Edward Tufte's Six Design Principals of Graphical Integrity[1]: The representation of numbers on the graphic should be prop...
Dynamic and Interactive Content
Thus far we've looked at building static content, but the backbone of D3.js are it's beautiful transitions and dynamic updating capabilities. Intervals We need some way of repeatedly running code to change something the chart reacts to. The easiest way to d...
Layouts and Structured Data
Now that I've covered the basics of programming in D3, let's take a look at some of the other cool things one can build with D3. Before jumping into the code, it's worth mentioning the resources available within the D3 community for sharing reusable code. As o...
File Structure and Linked Views
After adding a lot of different event listeners, the JavaScript file can become messy. This section focuses on writing readable code in an 'Object Oriented' way for larger projects (but OOP will not be covered in depth here). Once a class is set up for a visua...
Intro to Spark / RDDs
Apache Spark Spark is a fast and general engine for large-scale data processing. The user writes a Driver Program containing the script that tells spark what to do with your data, and Spark builds a Directed Acyclic Graph (DAG) to optimize workflow. With a ma...
DataFrames and Advanced Techniques
A Spark DataSet is an extension of the RDD object. It has rows, can run queries, and has a schema (which leads to more efficient storage and optimization). A DataFrame is just a DataSet of Row Objects, and unlike a DataSet the schema is inferred at runtime rat...
Scala
I'll start with a disclaimer: These are notes written by an experienced Java dev, thus some level of basic programming knowledge is required I cannot possibly cover every unique feature of Scala, only what's most important to me Awesome and free tutorials...