# WeiYa's Work Yard

## Introduction

1. essential feature of genetic studies: involve related individuals.
2. standard analysis methods of epidemiology is inappropriate which assume independence.

## Standard methods in genetics

### Genetic terminology

1. genes, alleles, diallelic, genotype,
2. homozygous, heterozygous
3. dominant trait, recessive, codominant
4. fully penetrant, locus, partially penetrant
5. polygene
6. Hardy-Weinberg equilibrium

### Genetic models

1. $y_i$: phenotype for subject $i=1,\ldots,I$
2. $x_i$: measured risk factors
3. $z_i$: polygene
4. $P$: a probability

a genetic model is specified in terms of two submodels.

1. penetrance model $P(y\mid G, x,\Omega)$
2. genotype model $P(G,z\mid\Theta)$

penetrance model:

1. $y_i$ are conditionally independent given their genotype
2. consider late-onset disease traits, characterized by a dichotomous disease status indicator $d$.
3. an age variable $t$
4. hazard function: $\lambda(t)$
5. the penetrance for an unaffected individual is the probability of surviving to age $t$ free of the disease, $S(t) = \exp[-\Lambda(t)], where \Lamda(t)=\int_0^t\lambda(u)du$
6. the penetrance for an affected individuals is the density function $\lambda(t)S(t)$
7. proportional hazards model: $\lambda(t, G, x, z)=\lambda_0(t)\exp\{\beta x+\gamma\cdot \mathrm{dom}(G)+\eta z+\ldots\}$
8. genotype model: $P(G\mid\Theta)=P(G_1\mid\Theta)\prod\limits_{i=2}^IP(G_i\mid G_1,\ldots,G_{i-1};\Theta)$

segregation analysis.

1. nuisance
2. frailty
3. meiosis
4. sperm
5. pedigree
6. daunting
7. spouse

## Abstract

### Approaches to multiclass learning problems

1. direct application of multiclass algorithms, such as C4.5 and CART
2. application of binary concept learning algorithms to learn individual binary functions for each of the $k$ classes
3. application of binary concept learning algorithms with distributed output representations.

### New techinique

Error-correcting codes are employed as a distributed output representation

It is robust with respect to

1. changes in the size of the training sample
2. assignment of distributed representations to particular pruning 3, application of overfitting avoidance techniques such as decision-tree pruning.

And it can provide reliable class probability estimates.

## Introduction

### Two class

1. decision-tree methods, such as C4.5 and CART.
2. artificial neural network algorithms, such as the perceptron algorithm and the error BP algorithm.

### Multiclass

1. direct multiclass approach: generalize decision-tree algorithms
2. one-per-class approach: learn one binary function for each class with connectionist algorithms. ($f_i,i=1,2,\ldots,k$)
3. distributed output code.

## Notes: Essentials of Survival Time Analysis

##### October 11, 2017 0 Comments

This post aims to clarify the relationship between rates and probabilities.

## Notes: Stochastic Epidemic Models

##### October 11, 2017 0 Comments

Discuss three different methods for formulating stochastic epidemic models.

## An R Package: Fit Repeated Linear Regressions

##### September 26, 2017 0 Comments

Repeated Linear Regressions refer to a set of linear regressions in which there are several same variables.

## A Faster Algorithm for Repeated Linear Regression

##### September 21, 2017 0 Comments

Repeated Linear Regression means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

## Dynamics of Helicobacter pylori Infection

##### September 02, 2017 0 Comments

This post is the notes of this paper.

## Healthy Human Microbiome

##### September 02, 2017 0 Comments

This post is the notes of this paper.

## Dynamics of Helicobacter pylori colonization

##### September 01, 2017 0 Comments

This post is the notes of this paper.

## Cox Regression

##### August 17, 2017 0 Comments

Survival analysis examines and models the time it takes for events to occur. It focuses on the distribution of survival times. There are many well known methods for estimating unconditional survival distribution, and they examines the relationship between survival and one or more predictors, usually terms covariates in the survival-analysis literature. And Cox Proportional-Hazards regression model is one of the most widely used method of survival analysis.

## Conjugate Gradient for Regression

##### August 13, 2017 0 Comments

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

## Story about P value

##### August 09, 2017 0 Comments

“The p value was never meant to be used the way it’s used today.” –Goodman

## The Gibbs Sampling

##### July 22, 2017 0 Comments

The Gibbs sampler is a special MCMC scheme. Its most prominent feature is that the underlying Markov chain is constructed by composing a sequence of conditional distributions along a set of directions.

## Model Specification

##### July 17, 2017 0 Comments

For a given time series, how to choose appropriate values for $p, d, q$