With an estimated 62 million children under the age of five experiencing stunted growth, India is at the epicentre of the child undernutrition crisis [1,2]. Although there has been a steady decline over the past two decades, a third of India’s child population experiences stunted growth and one out of five Indian children suffers from wasting . Expert analyses of child undernutrition in India reveal barriers to improvement at the sociocultural and governance level [3–5]. Improving women’s status and living conditions can help address the underlying causes of child undernutrition in India [6–9]. At the same time, programs for improving proximal determinants of child growth and development, ie, access to health care, supplemental nutrition, and provision of balanced meals, are needed to prevent undernutrition among India’s vulnerable population [3–5,10–12]. India has several national-level programs that promise to improve the nutritional status of the nation’s children by 1) providing supplemental nutrition to pregnant and lactating mothers, 2) improving prenatal and infant care, and 3) addressing upstream determinants of child undernutrition ie, parental employment, access to clean water and toilet facilities, and promoting sanitary hygiene . However, these programs are operated by different ministries at the national level resulting in fragmented implementation and limited accountability [3–5].
To address these governance issues, India launched a three-year ₹9046 crore (roughly equivalent to US$1.3 billion) National Nutrition Mission (NNM) in 2018 with the objective of bringing convergence across existing national-level programs. The goal of the mission is to reduce the prevalence of childhood stunting from 38.4% to 25% by 2022 . A main component of the NNM is to leverage mobile technology for efficient implementation and real-time monitoring of program activities . The NNM has developed a mobile application that serves as a dashboard for all nutrition-related programs. The mission also provides a mobile phone to 1.4 million frontline health workers (FHWs) to deliver services and report progress more effectively to their supervisors. The guidelines and resources for the implementation of these programs are allocated by the national government, which prioritizes 18 out of 36 states determined as high or special focus, as well as 184 out of the 640 districts classified as high priority districts [14–17]. However, a substantial burden of undernutrition persists outside of these high priority regions [18,19]. In order to empower FHWs to intervene in a timely manner and prevent child undernutrition, it is important to develop a mechanism to stratify at-risk children so that FHW’s efforts and resources can be deployed most efficiently.
Therefore, the purpose of this study was to develop and internally validate a model that could predict the outcome of child undernutrition in the first five years of life using data available at the time of delivery. We chose to limit the predictive variables to those collectable at the time of delivery, because FHWs routinely register the birth of new children within their community using the Mother and Child Tracking System (MCTS). A risk-score calculated from this data can be employed to stratify and direct focus on children at high risk of developing undernutrition.
Data sources and procedures
Data from the fourth round of the National Family Health Survey (NFHS-4), 2015-2016 were used in the analyses . The NFHS-4 was conducted by the International Institution for Population Sciences (IIPS), Mumbai, under the stewardship of the Ministry of Health and Family Welfare, Government of India. The protocol for NFHS-4 survey was approved by the IIPS Institutional Review Board (IRB) and reviewed by the US Centres for Disease Control and Prevention. University of Massachusetts Medical School IRB reviewed the protocol for the secondary data analyses presented in this manuscript and deemed it exempt from full review, because the data contained no personal identifiable information.
This nationally representative sample to estimate indicators at the district level (n = 640) across the 36 state and union territories was derived with a stratified two-stage sampling method using the 2011 census as the sampling frame for selection of the Primary Sampling Unit (PSUs): villages for rural stratum and census enumeration blocks in urban stratum. Within each stratum, PSUs were identified with probability proportional to size. In every selected PSU, a complete household mapping and listing operation was conducted. In the second stage, 22 households were randomly selected using systematic sampling from each PSU. This process identified 628 900 households, of which 616 346 were occupied, and 601 509 were interviewed (97.6% response rate). A total of 699 686 women aged 15-49 years responded to the women’s questionnaire and provided information on 268 873 children aged 0-59 months. Data were collected by a total of 789 field teams over a period of almost two years (January 20, 2015, to December 4, 2016). Questionnaires were administered in 17 local languages using computer assisted personal interviewing. Weight and height were measured for all children aged 0-59 months and women aged 15-49 years.
Figure 1 describes the derivation of the analytic sample used in this study. Children who were not alive at the time of the survey or did not belong to the household but were present at the time of survey were excluded from the analyses. Children were excluded if their younger sibling was interviewed in the survey because inclusion of multiple siblings would violate the assumption of independent observations. Of the 167 711 eligible children, data from 129 040 children were used to build a predictive model for child undernutrition due to missing or implausible data.
Figure 1. Flowchart demonstrating exclusions and final analytic sample of the 2015-16 National Family Health Survey from India included in this study.
Child undernutrition is routinely assessed using anthropometric indicators, namely stunting (height-for-age z score<-2), underweight (weight-for-age z score<-2), and wasting (weight-for-height z score<-2). Emerging scholarship suggests that a focus on any single indicator underestimates the overall prevalence of child undernutrition. Instead, Comprehensive Index of Anthropometric Failure (CIAF), which considers a child to be undernourished if any of the three forms of undernutrition are present, is recommended . Therefore we considered a child to be undernourished, if either height-for-age, weight-for-age, or weight-for-height z-score was below -2.
Predictor variables were identified using the integrated framework of child undernutrition . Risk factors that cannot be collected at the time of delivery, eg, breastfeeding practices, infant dietary diversity, childhood illnesses, were excluded. Additional risk factors specific to the Indian context were identified via literature review [22–24]. Maternal stature, education, ability to read local language, preceding birth interval, and age at first birth were considered. Antenatal receipt of iron supplementation for at least three months and at least four maternal visits were also considered. Child-related factors included birthweight and sex. Number of siblings, access to a toilet, rural residence, ownership of a below poverty level card, caste and religion of the household head, treatment practices of drinkable water, construction type for house, floors, and walls, use of soap for handwashing, presence of a separate kitchen, and use of non-solid fuel were the household conditions considered for the analyses. The operational definition of these variables was based on NFHS-4 guidelines and enumerated in Table 1 . The mean prevalence of CIAF per district was considered in decile groupings. Residence in high focus state and/or high priority districts was modelled using separate dummy variables.
Table 1. Empirical distribution and weighted proportion of predictors of child undernutrition for all eligible children with anthropometric data (n = 167 711) from the 2015-2016 National Family Health Survey of India
CIAF – comprehensive indicator of anthropometric failure, w-% – weighted estimate of proportion
All statistical analyses were performed in STATA 15 MP using the svyset and svy commands to account for complex survey weighting of each participant. The associations between predictive variables and CIAF were evaluated using bivariate and multivariable logistic regression models to derive unadjusted and adjusted odds ratios. Individual predicted probabilities corresponding to unadjusted and adjusted models were calculated for analytic sample sets using the inverse logit transformation of beta coefficients for the participant’s covariate distribution. Each model’s discrimination ability was assessed by calculating c-statistic using “roctab” command for the predicted probabilities against the CIAF outcome. Similarly, model calibration was assessed using Brier scores and Hosmer-Lemeshow goodness of fit tables using decile grouping of predicted probabilities against the binary outcome variable. To achieve a parsimonious model, predictor variables were added in a stepwise manner and included in the model if their inclusion increased the full model’s multivariable c-statistic by at least 0.01 or reduced the brier score by at least 0.001. Continuous variables were categorized and the number of categories for existing discrete categories was minimized using receiver-operator curve analyses for multivariable models to achieve a model that is easy to implement in the real world. Birth weight was transformed to clinically meaningful categories of extremely low birth weight (<1800g), low birth weight (1800-2500g), or normal birth weight (>2500g). Maternal height was categorized into three categories (5′2” or taller, 4’8” to less than 5′2”, and less than 4’8”) by examining a plot of multivariable c-statistic against maternal height values to the nearest inch . The final step examined interactions between variables in the fully specified model. Because inclusion of interaction variables did not improve model performance, they were not included in the final model.
Internal validation of the final model was performed using previously established methods to test and correct for optimism, which is the difference between the model’s c-statistic and the bias-corrected c-statistic of resampled data sets using nonparametric bootstrap methods [26,27]. A simulation study comparing internal validation performance of various methods found that the bootstrapping method outperformed various split-sample methods . Therefore, a total of 200 data sets were resampled, representing participants corresponding to 14 000 PSU clusters selected with replacement. The final estimate of internal validity was derived by subtracting optimism from the model’s c-statistic to penalize the model for overfitting.
Table 1 describes the empirical and survey-weighted distribution of well-known predictors of child undernutrition among the children eligible for this analysis. More than half of the children (89 804; 53.6%) included in the analyses were undernourished based on the CIAF definition, corresponding to a weighted proportion of 54.4% (95% CI = 54.0%-54.8%). Three-fourths of all children surveyed lived in rural regions. More than half (54.0%) were male, and two-thirds had zero or one living sibling. Birth weight was available for 131 139 children and 37.0% weighed 2500g or less. The majority (83.7%) of the mothers of children included in the study had a stature shorter than 5 feet 2 inches. Nearly one in four mothers (28.5%) had no formal education and a third (32.5%) could not read their local language. Most children were from families that self-identified as belonging to an underprivileged caste (77%). Only 37.9% of children belonged to households that reported access to a toilet and two-thirds of the children belonged to families that use solid fuel for cooking. All predictors were closely associated with child undernutrition (P < 0.001).
After criteria-based model building and variable transformation, fifteen predictors, including three geographical variables, were selected in the final individual covariate model. The results of bivariate and multivariable logistic regression for the outcome of child undernutrition using these predictors are presented in Table 2. After adjusting for other covariates, all predictors remained strongly associated with the outcome of child undernutrition albeit the effect estimates were attenuated, with the exception of residence in high focus states or high priority districts. In bivariate regression, residence in a high focus state or high priority district increased the odds of child undernutrition. However, after accounting for other predictors, residence in these regions was associated with a moderate reduction in the odds of child undernutrition. Table S1 in the Online Supplementary Document describes the association of these factors with stunting as an outcome.
Table 2. Results of weighted logistic regression for the outcome of comprehensive index of anthropometric failure in the first five years of life based on data from 2015-2016 National Family Health Survey in India*
OR – odds ratio, LCI – lower confidence interval, UCI – upper confidence interval, β – beta-coefficient, CIAF – composite index of anthropometric failure
*Multivariable c-statistic = 0.68 (optimism-corrected = 0.67) brier score = 0.225.
The final model had a reasonable discrimination ability as measured by a survey-weighted c-statistic of 0.68 (optimism-adjusted c-statistic: 0.67) and a Brier score of 0.225. The Hosmer-Lemeshow goodness of fit table for distribution of observed and expected prevalence for each decile risk group is presented in Table 3. One in four children categorized into the lowest risk group was undernourished while four in five children in the highest risk group were undernourished. The observed and expected prevalence were within 95% CI for each risk grouping. Table 4 describes model performance and discrimination ability across different sub-groups. Overall, the model calculated the lowest individual probability of child undernutrition at 13.6% and highest probability at 92.0%. The model performed consistently across all subgroups except for children under the age of six months, for whom model had poor discrimination ability (c-statistic = 0.63).
Table 3. Hosmer-Lemeshow goodness of fit table for distribution of observed CIAF prevalence vs predicted prevalence across decile risk groups for undernutrition among the children under the age of five from the 2015-2016 National Family Health Survey in India
CIAF – composite index of anthropometric failure
Table 4. Model performance and discrimination ability across different sub-groups
CIAF – composite index of anthropometric failure
In this study, we used data from a nationally representative survey to develop a predictive algorithm that can predict five-year risk of undernutrition among Indian children at the time of their delivery. The model uses information about the child, child’s mother, child’s household, and the child’s geographical region. All factors included in the final model have been identified as closely associated with child undernutrition in previous studies using data from the 2004-2005 National Family Health Surveys [22–24]. It is important to note that the factors included in the model do not represent a comprehensive list of predictors of child undernutrition in India, but rather those that can be collected at the time of delivery or in a reasonable time frame. For instance, breastfeeding practices and timely introduction of complementary foods play an important role in child nutrition but cannot be captured at the time of the delivery. This approach lowers the performance of a predictive algorithm but allows for identification of at-risk children at the time of their birth and can empower FHWs to make informed decisions for prioritizing services and surveillance of the vulnerable children.
Health Information Systems play an important role in facilitating routine service delivery activities by the FHWs. India launched the Mother and Child Tracking System (MCTS) in 2009 as a web-based portal that collected data from FHWs for all pregnant women in their region, especially at the time of delivery . The MCTS generates automated schedules for FHW for services due and sends SMS reminders to FHW and the beneficiaries, ensuring continued medical care. Evaluation of the MCTS has demonstrated its benefits in helping FHW more effectively provide services and follow-up with mother and young children in their region [30,31]. As technological capacity advances, health information systems such as MCTS can be used to facilitate targeting the most vulnerable populations. Integration of the model developed in our study within the MCTS presents such an opportunity to employ data-driven approaches for improving decision making by FHW, supervisory, and managerial health officials. Because ten of the fifteen questions included in our predictive model are already captured by the MCTS, the additional burden of data collection for FHW (toilet access, living in semi-finished or finished house, type of cooking fuel, using soap after toilet use) will be minimal if the predictive model was integrated as part of the MCTS .
The high prevalence of child undernutrition and the limited discrimination ability of the model dictates a careful use of the risk score. Intensification of resources for high-risk groups should be favoured over withdrawal of resources from low-risk groups. One approach could be to use the risk score to inform the frequency of counselling and follow-up from FHW. A Bangladeshi program focused on a population of 8.5 million mothers led to rapid and significant improvements in key breastfeeding and complementary feeding practices because of promotion strategies that targeted high priority groups through more frequent contacts . The program found that complementing mass media campaigns with innovative approaches to improve the performance of FHW in delivering timely counselling to high-risk mothers were central to its success. Thus, an automatically generated schedule for children based on their risk score can ease the burden on FHW and allow them to provide services and monitor child growth in a timely manner.
In addition to helping the FHW risk-stratify and efficiently provide services for the most vulnerable children, the results of the model can also inform priority setting and resource allocation at the district, state, and national levels. Currently, the Indian government prioritizes funding allocation for implementation of national level programs for high-focus states and high-priority districts [14–17]. Our results show that the children living in these regions were more likely to be undernourished, but after accounting for other risk factors, they had lower odds of being undernourished than a child with the same covariate pattern living in a normal-focus state or priority district. A possible explanation for this discrepancy might be that the added resources allocated to high-priority regions helps prevent undernutrition among vulnerable children in comparison to their counterparts in other regions. Therefore, allocating resources based on regional distribution of risk-score might provide a more equitable approach.
It is important to consider certain limitations of our study. This model is based on self-reported data collected as part of a cross-sectional survey study. Although the survey was conducted by research staff, who received rigorous training in administering standardized questionnaires, recall bias on behalf of the respondents cannot be ruled out. However, the final model includes objective questions that are less prone to such bias. The use of an indicator variable (CIAF) likely leads to lack of precision and negatively impacts the performance of the model; however, we favour this approach, because CIAF in early childhood is associated with meaningful physical and cognitive outcomes . An underlying assumption of the model is that the responses about household characteristics, ie, access to toilet, treatment of drinkable water, use of cooking fuel do not change throughout the first five years. It is plausible that water and sanitation hygiene improved after childbirth, especially if the child experienced illness or was undernourished. In this scenario, the coefficient of associations presented in our analyses are likely an underestimation. Due to the cross-sectional nature of the survey, we cannot test this assumption. Our final model is derived from nationally representative data, was internally validated using 200 random bootstrapped samples, and had reasonable predictive capability. However, further work is necessary to externally assess and validate this model and its performance in real-world settings.
One such opportunity comes from its integration within the existing MCTS portal. The additional data on feeding practices, immunization records, and growth outcomes as the child advances in age can be used to augment the model and calculate a dynamic risk-score for child undernutrition that changes over time. By comparing the risk score with the outcome of child undernutrition in the first five years, the model can be calibrated further using decision curve analyses to identify risk thresholds for child undernutrition. Thus, the current model developed by this study represents a first step in adopting a risk-score based approach for the most vulnerable population to receive services in a timely manner.
This article describes the development and validation of a predictive algorithm to identify newborn’s risk of developing undernutrition in the first five years of life using data that is routinely available at the time of delivery. This approach can facilitate efficient allocation of scarce resources, especially when leveraged with existing public health infrastructure.