WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Gibbs in genetics

August 24, 2018

The notes is for Gilks, W. R., Richardson, S., & Spiegelhalter, D. (Eds.). (1995). Markov chain Monte Carlo in practice. CRC press..

Introduction

  1. essential feature of genetic studies: involve related individuals.
  2. standard analysis methods of epidemiology is inappropriate which assume independence.

Standard methods in genetics

Genetic terminology

  1. genes, alleles, diallelic, genotype,
  2. homozygous, heterozygous
  3. dominant trait, recessive, codominant
  4. fully penetrant, locus, partially penetrant
  5. polygene
  6. Hardy-Weinberg equilibrium
  7. Linkage analysis

Genetic models

  1. $y_i$: phenotype for subject $i=1,\ldots,I$
  2. $x_i$: measured risk factors
  3. $z_i$: polygene
  4. $P$: a probability

比较表型和性状:

性状是指生物体所有特征的总和,由基因决定,必须是可以遗传的。而表型则是这些基因决定的性状在环境作用下的具体表现,与性状的概念有着本质区别,表型是不可遗传的。所以说表型又称性状的观点是不符合遗传学概念的。

a genetic model is specified in terms of two submodels.

  1. penetrance model $P(y\mid G, x,\Omega)$
  2. genotype model $P(G,z\mid\Theta)$

penetrance model:

  1. $y_i$ are conditionally independent given their genotype
  2. consider late-onset disease traits, characterized by a dichotomous disease status indicator $d$.
  3. an age variable $t$
  4. hazard function: $\lambda(t)$
  5. the penetrance for an unaffected individual is the probability of surviving to age $t$ free of the disease,
  6. the penetrance for an affected individuals is the density function $\lambda(t)S(t)$
  7. proportional hazards model:
  8. genotype model:

segregation analysis.

New words

  1. nuisance
  2. frailty
  3. meiosis
  4. sperm
  5. pedigree
  6. daunting
  7. spouse

Continue reading



ECOC

August 18, 2018

The notes is for Dietterich, T. and Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research 2: 263–286..

Abstract

Approaches to multiclass learning problems

  1. direct application of multiclass algorithms, such as C4.5 and CART
  2. application of binary concept learning algorithms to learn individual binary functions for each of the $k$ classes
  3. application of binary concept learning algorithms with distributed output representations.

New techinique

Error-correcting codes are employed as a distributed output representation

It is robust with respect to

  1. changes in the size of the training sample
  2. assignment of distributed representations to particular pruning 3, application of overfitting avoidance techniques such as decision-tree pruning.

And it can provide reliable class probability estimates.

Introduction

Two class

  1. decision-tree methods, such as C4.5 and CART.
  2. artificial neural network algorithms, such as the perceptron algorithm and the error BP algorithm.

Multiclass

  1. direct multiclass approach: generalize decision-tree algorithms
  2. one-per-class approach: learn one binary function for each class with connectionist algorithms. ($f_i,i=1,2,\ldots,k$)
  3. distributed output code.

Continue reading



Metabolic Network

August 08, 2018

Continue reading



Power Analysis

December 27, 2017 0 Comments

Continue reading



Notes: Model-Free Scoring System for Risk Prediction

October 17, 2017 0 Comments

Continue reading



Notes: Temporal-Difference Learning

October 12, 2017 0 Comments

Continue reading



Notes: Essentials of Survival Time Analysis

October 11, 2017 0 Comments

This post aims to clarify the relationship between rates and probabilities.

Continue reading



Notes: Stochastic Epidemic Models

October 11, 2017 0 Comments

Discuss three different methods for formulating stochastic epidemic models.

Continue reading



An R Package: Fit Repeated Linear Regressions

September 26, 2017 0 Comments

Repeated Linear Regressions refer to a set of linear regressions in which there are several same variables.

Continue reading



A Faster Algorithm for Repeated Linear Regression

September 21, 2017 0 Comments

Repeated Linear Regression means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

Continue reading



Notes: Persistence of species in the face of environmental stochasticity

September 18, 2017 0 Comments

Continue reading



Chain-Structured Models

September 08, 2017 0 Comments

Continue reading



Basic Principles of Monte Carlo

September 07, 2017 0 Comments

Continue reading



The Applications of Monte Carlo

September 07, 2017 0 Comments

Continue reading



Dynamics of Helicobacter pylori Infection

September 02, 2017 0 Comments

This post is the notes of this paper.

Continue reading



Healthy Human Microbiome

September 02, 2017 0 Comments

This post is the notes of this paper.

Continue reading



Dynamics of Helicobacter pylori colonization

September 01, 2017 0 Comments

This post is the notes of this paper.

Continue reading



Restricted Boltzmann Machines

August 26, 2017 0 Comments

Continue reading



Cox Regression

August 17, 2017 0 Comments

Survival analysis examines and models the time it takes for events to occur. It focuses on the distribution of survival times. There are many well known methods for estimating unconditional survival distribution, and they examines the relationship between survival and one or more predictors, usually terms covariates in the survival-analysis literature. And Cox Proportional-Hazards regression model is one of the most widely used method of survival analysis.

Continue reading



Conjugate Gradient for Regression

August 13, 2017 0 Comments

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

Continue reading



Bayesian Estimation for Linear Regression

August 13, 2017 0 Comments

Continue reading



Story about P value

August 09, 2017 0 Comments

“The p value was never meant to be used the way it’s used today.” –Goodman

Continue reading



Poisson Regression

August 01, 2017 0 Comments

Continue reading



Estimate Parameters in Logistic Regression

July 30, 2017 0 Comments

Continue reading



SMC in Biological Problems

July 22, 2017 0 Comments

Continue reading



The Gibbs Sampling

July 22, 2017 0 Comments

The Gibbs sampler is a special MCMC scheme. Its most prominent feature is that the underlying Markov chain is constructed by composing a sequence of conditional distributions along a set of directions.

Continue reading



Metropolis Algorithm

July 21, 2017 0 Comments

Continue reading



A Bayesian Missing Data Problem

July 18, 2017 0 Comments

Continue reading



Model Specification

July 17, 2017 0 Comments

For a given time series, how to choose appropriate values for $p, d, q$

Continue reading



Growing A Polymer

July 17, 2017 0 Comments

Continue reading



See all posts →