Tuesday, June 27, 2017

Exploring aic (Akaike Information Criterion)

Goodness of Fit of Statistical Models 

aic plays a key role in the interpretation of efficiency of probability models with respect to a model of reference. Here the ratio of two likelihood functions is taken and it is equivalent to the difference between the two loglikelihood functions. So the greater the difference between loglikelihood functions the greater is the difference in efficiency  between the models.
aic=-log(L(P1)/L(P2))+k
Here P1 is estimator of the parameter of model 1 and L(P1) is the likelihood function obtained from probability model 1. L(P2) is the likelihood function obtained from probabilitymodel 2, with P2 as the estimator of the parameter of this model. k is the degree of freedom.
aic = -[log(L(P1))-log(L(P2))]+k
This expression shows that the greater the difference [log(L(P1))-log(L(P2))] the higher the value of aic. So high value of difference indicates greater dissimilarity between model 1 and model 2. For relatively close models aic should be lower than the aic of relatively distant models. Smaller value of aic implies that the fit is good. The model with higher value of aic ( with respect to a model of reference)should be rejected. 

Saturday, June 24, 2017

Fertility Rates as Development Indicators

Vital Statistics as Development Indicators 

The discussion on fertility rates as development indicators discussed in the previous BLOG is continued here. In the image below the fertility rates of Nepal and Germany are compared. These fertility rates are namely  Age specific fertility rate (ASFR) and Total fertility rate (TFR). This comparison shows that the development levels of two countries can be compared by comparing their fertility rates. In the previous blog a comparison between TFR was done to explain this concept . Here in the table (image)below ASFR and TFR are compared. ASFR is defined as the number of birth per 1000 women in that age group. For example, the ASFR for the age group 20-24 for Nepal in the year 2000-2005 and 1995-2000 is 231.2 and 257.4, implying that out of 1000 women in the age group 20-24, 231.2 gave birth in 2000-2005 and 257.4 gave birth in 1995 - 2000. This number has declined sharply from 1995 to 2005. Whereas in Germany this value is 53.4 and 58.5 respectively. In Germany the decline in not so drastic as compared to Nepal. ASFR attains its peak in Nepal in the age group 20-24. This implies that in Nepal most of the women bear children in the age group 20-24.The ASFR  data of Nepalbased on census 2011also exhibits this pattern. But in Germany ASFR is highest in the age group 25-29. As the country becomes more developed not only does the TFR decrease (Germany 1.3 , Nepal 3.7 in 2000-2005) but the age at which ASFR attains its peak  increases. The reason behind this is that as the development activity increases more and more women become economically active through various job opportunities coupled with such developments, this results in decrease in TFR. Many women shift their age when they bear their first child to later years (25-29 in Germany) resulting in highest ASFR in 25-29.

Comparison between Germany and India


Monday, June 19, 2017

Fertility Rates as Development Indicators

Development Indicators


Commonly used fertility measures are Total Fertility Rate (TFR), Age Specific Fertility Rate (ASFR), Crude Birth Rate (CBR), Net Reproduction Rate (NRR) and Gross Reproduction Rate (GRR). The rate of growth of population is viewed from different perspectives through these measures. Total fertility rate (TFR) is defined as the number of children that a woman bears in her entire fertility span. According to 2011 census TFR of Nepal is 2.52 with 1.52 for urban areas and 3.08 in rural areas. This means that a woman bears 2.52 children in her entire life span. The women in urban areas bear 1.52 children whereas the women in rural areas bear 3.08 children. TFR of Nepal in 1971 was 6.32. This shows that Nepal has made a big progress in the reduction of TFR from 6.32 in 1971 to 2.52 in 2011. A low TFR is related to high average life expectancy and low Infant Mortality Rate (IMR). In 2011 Census TFR of 2.52 is coupled with 66.6 years of average life expectancy and 40.5 IMR, implying that in 2011 a woman bears 2.5 children in her life time, a child born has life of on average 66.6 years and there are 40.5 infant deaths per 1000 live births. In the absence of use of contraceptives and birth control measures the TFR of a country is around 6. This is reflected by TFR of Nepal in 1971 of 6.32. High TFR is coupled with high IMR and low life expectancy at birth. This fact is validated by the rural and urban differential in these measures. According to 2011 census of Nepal, TFR urban is 1.54 and TFR rural is 3.08, IMR urban is 24.06 and IMR rural is 42.9 and finally the life expectancy at birth of Urban areas of Nepal is 70.5 and 66.6 for rural areas. High TFR indicates poor health facilities and health conditions in governmental hospitals in rural areas in contrast to urban areas. This is also coupled with a lower value of average life expectancy in rural areas and a very high IMR. High IMR also implies large deaths of infants due to poor nutrition of mother and poor health facilities.  Many countries in Africa like Sierra Leone, Angola have high incidence of diseases like Malaria and HIV AIDS have low average life expectancy (50.1and 52.4 years respectively in 2016) and high TFR (4.76 and 5.31 respectively in 2016). Many countries with low TFR (lower than 2), have a declining population as two people mother and father are replaced by less than 2people.  Countries with low TFR have high average life expectancy and low IMR. For example in 2016, TFR of Italy is 1.43, this is very low and it is coupled by a very low value of IMR of 3.3 deaths per 1000 live births and average life expectancy of 82.7 years. This is due to good health facilities provided by the government of such countries. A low value of IMR also indicates good nutritive diet received by the mother. So TFR can be related to the development status of a country, where a country with low TFR has high socioeconomic and developmental status in contrast to countries with high TFR. This discussion will be continued in the coming BLOGS.  

Thursday, June 8, 2017

Maximum Likelihood Estimation illustrated with an example

Predicting the risk of Zika Virus infection

Here the concept of MLE explained in the blog on June 6 is explained with an example. Denoting by P the proportion  of babies infected by the Zika virus,  among the population of Zika virus infected pregnant mothers in an  area like say Brazil, this P is a population parameter. P is unknown, but we want to know its true value. But it is impossible to know the entire population and the population parameter due to time and monitory constraints. 10 samples of size 50 each are selected for the estimation of P. That is 10 samples of 50 Zika infected pregnant women are selected. So, n = 50 and m = 10. We are interested in finding the number of babies infected with zika virus among 50 zika infected pregnant women selected in 10 batches. So Xi is the random variable denoting number of zika infected babies in a sample of size 50. So, Xi = 0, 1, 2, ....50. So the MLE is the number of zika infected pregnant women bearing zika infected babies divided by 500. This gives the fraction of zika infected pregnant women bearing zika infected babies among 500 zika infected pregnant women and this is the MLE of P.

Tuesday, June 6, 2017

Exploring MLE and Maximum Likelihood Estimation

Maximizing the Likelihood

Maximum likelihood estimation chooses a sample statistic that maximizes the likelihood (probability function) of occurrence of sample for a particular parameter. Maximum likelihood estimation is based on the concept of Maxima and Minima. Through maximum likelihood estimator (MLE) an estimator is chosen; this estimator maximizes the likelihood of occurrence of the sample. The probability function of the occurrence of the sample is maximized by taking the derivative of the  log of the likelihood function (with respect to the parameter) and equating it to zero. This gives an estimator ( based on the sample) of the parameter. The second derivative of this loglikelihood function will be less than zero for this MLE. We also know that the probability function of this sample is based on a population parameter. Population is unknown and so is the population parameter. But the sample is known and we try to find the estimator based on the sample. This estimator maximizes the probability (likelihood) of this sample. For example 
Xi~B(n, P) that is X follows Binomial with parameter n and P. Here i = 1, 2, ....m
Then the MLE of P is  Sum of Xi (over all m)/nm and is given in the image below. This will be illustrated with an example in the next blog.