Public Health Surveillance

BS728: A Methods Based Approach to Public Health Surveillance

Surveillance Defined

Surveillance is the ongoing systematic collection, analysis and interpretation of outcome-specific data for use in the planning, implementation and evaluation of public health practice. Surveillance can have a negative connotation, but we can use it to:

The authority of surveillance lies almost entirely at the state level. The CDC only responds when diseases have interstate implications or they are invited by a state.

Modes of Surveillance

Active Surveillance
Passive Surveillance
Sentinel Surveillance
Syndromic Surveillance

Surveillance Systems Attributes

Data Sources and Reportable Disease

Electronic health records, birth and death registries and surveys are all examples of data sources for public health data. The CDC publishes a summary of reportable disease activity each week in the MMWR.

In MA, disease are reported through an electronic system called MAVEN.

After a drug is approved, passive surveillance is performed to detect adverse events. Health professionals or consumers can report suspected adverse events through MedWatch on the FDA site.

National Center for Health Statistics (NCHS) administers national health surveys and oversees vital statistics and archive of national data.

Demographic and Health Surveys (DHS) are a tool which can be used in resource poor settings and performed regularly.

Emerging Infection Program (EIP) was established in 1995 by the CDC. It is a network of 10 state health departments and their collaborators. Some of their work includes Active Bacterial Core Surveillance (ABCs), FoodNet, and impact of infectious diseases.

Public Health Action

Sampling

Terminology

Observation Unit: Object on which measurement is taken
Sampling Unit: A unit that can be selected for a sample
Target Population: The completely group we want to study and make statements about
Census: Survey designed to sample the entire population
Sample: Finite sample of target population
Sampling Frame: List, map, etc. that shows all units from which a sample can come
Parameter/Statistic: Any numerical value that describes a population
Estimator: Any statistic that approximates a parameter

Variance: How precise is the estimator? What are sources of uncertainty
Bias: How close is the statistic to the parameter

Sources of Bias

Convenience sample: Select units that are the easiest to get
Judgment sample: Purposely selecting a "representative" sample
Misspecify the target population
Undercoverage: Fail to include all the target populations
Overcoverage: Include population units in the sampling frame that are not included in the target population
Nonresponse: Failing to get responses from all who were chosen to be in the sample
Sample consists entirely of volunteers
Measurement error:
Sensitive questions people will lie on, recall bias when people forget, question wording or order

Central Limit Theorem

A very important idea in sampling is when we select a large, random sample measuring an estimator it will eventually meet the true population value, and we can use a normal distribution. It also tells us how "wide" the histogram is, or how much our sample mean could vary from the true mean.

image-1667319781909.png

 

Sampling

Types of probability samples:

Complex designs can be necessary to extract valid or more precise information from a sample we want to represent a target population.

In a simple random sample each individual has equal chance of being selected, but in clustering we need to weight samples if the clusters are different sizes.

Missing Data

Missing Completely at Random (MCAR)

The probability an individual value will be missing does not depend on the outcome, any collected variables, variables not collected or the survey design

Missing at Random (MAR)

The probability an individual value is missing is independent of the outcome of interest and unobserved variables, but depends on the covariates in the model. In other words the response rate only depends on observed data. 

Non-ignorable missing data

The probability an individual value is missing depends on unobserved variables and cannot be completely explained by variables that have been collected

What to do?

  1. Ignore it
    • Worst approach as it reduces sample size and power
  2. Prevent it
    • Try to design the survey to minimize non-response
  3. Statistical methods
    • Imputation - Estimating missing values from information from other observations
      • Divide data into homogenous strata and determine the variables to impute

Sampling Strategy

We could determine if a intervention is meeting the target through a census, sampling plan ,or LQAS.

LQAS

image-1669731978116.png

As always increasing sample size decreases error. Also we can decrease the size of alpha.

LQAS classifies whether something is likely to meet the threshold or not. It does not measure prevalence or probability.

Time Series

A times series is a set of data point collected over time,. These might be measurements of a daily process and are quite common in surveillance. Statistical models for time series is indexed by time (X or y) and may or may not be independent or identically distributed.

Collection tools and methods

When modeling a time series we can choose to use a retrospective or prospective approach to analysis. This allows use to use times series for event detection, interpretation of past results, forecasting or decision making.

General Approach
  1. Plot the series and examine the main features
    • Trend
    • Seasonal component
    • Any apparent changes in behavior
    • Any outlying observations
  2. Model trend and seasonal components to get stationary residuals
    • Regression methods are useful for this
  3. Choose a model to fit the residuals
  4. In some settings, forecasting can be performed by forecasting the residuals and then adding on predicted trend and seasonal component

Once we have a time series model we can apply it to a retrospective or prospective application.