The notes is for Gilks, W. R., Richardson, S., & Spiegelhalter, D. (Eds.). (1995). Markov chain Monte Carlo in practice. CRC press..

- essential feature of genetic studies: involve related individuals.
- standard analysis methods of epidemiology is inappropriate which assume independence.

- genes, alleles, diallelic, genotype,
- homozygous, heterozygous
- dominant trait, recessive, codominant
- fully penetrant, locus, partially penetrant
- polygene
- Hardy-Weinberg equilibrium
- Linkage analysis

- $y_i$: phenotype for subject $i=1,\ldots,I$
- $x_i$: measured risk factors
- $z_i$: polygene
- $P$: a probability

比较表型和性状：

性状是指生物体所有特征的总和，由基因决定，必须是可以遗传的。而表型则是这些基因决定的性状在环境作用下的具体表现，与性状的概念有着本质区别，表型是不可遗传的。所以说表型又称性状的观点是不符合遗传学概念的。

a genetic model is specified in terms of two submodels.

- penetrance model $P(y\mid G, x,\Omega)$
- genotype model $P(G,z\mid\Theta)$

penetrance model:

- $y_i$ are conditionally independent given their genotype
- consider late-onset disease traits, characterized by a dichotomous disease status indicator $d$.
- an age variable $t$
- hazard function: $\lambda(t)$
- the penetrance for an unaffected individual is the probability of surviving to age $t$ free of the disease,
- the penetrance for an affected individuals is the density function $\lambda(t)S(t)$
- proportional hazards model:
- genotype model:

segregation analysis.

- nuisance
- frailty
- meiosis
- sperm
- pedigree
- daunting
- spouse

- direct application of multiclass algorithms, such as C4.5 and CART
- application of binary concept learning algorithms to learn individual binary functions for each of the $k$ classes
- application of binary concept learning algorithms with distributed output representations.

Error-correcting codes are employed as a distributed output representation

It is robust with respect to

- changes in the size of the training sample
- assignment of distributed representations to particular pruning 3, application of overfitting avoidance techniques such as decision-tree pruning.

And it can provide reliable class probability estimates.

- decision-tree methods, such as C4.5 and CART.
- artificial neural network algorithms, such as the perceptron algorithm and the error BP algorithm.

- direct multiclass approach: generalize decision-tree algorithms
- one-per-class approach: learn one binary function for each class with connectionist algorithms. ($f_i,i=1,2,\ldots,k$)
- distributed output code.

This post aims to clarify the relationship between rates and probabilities.

Discuss three different methods for formulating stochastic epidemic models.

*Repeated Linear Regressions* refer to a set of linear regressions in which there are several same variables.

*Repeated Linear Regression* means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

This post is the notes of this paper.

This post is the notes of this paper.

This post is the notes of this paper.

**Survival analysis** examines and models the time it takes for events to occur. It focuses on the distribution of survival times. There are many well known methods for estimating unconditional survival distribution, and they examines the relationship between survival and one or more predictors, usually terms **covariates** in the survival-analysis literature. And Cox Proportional-Hazards regression model is one of the most widely used method of survival analysis.

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

“The p value was never meant to be used the way it’s used today.” –Goodman

The Gibbs sampler is a special MCMC scheme. Its most prominent feature is that the underlying Markov chain is constructed by composing a sequence of conditional distributions along a set of directions.

For a given time series, how to choose appropriate values for $p, d, q$