Pneumonia is one of the leading causes of mortality in children aged below five years worldwide . The overall global incidence of pneumonia is 0.22 (IQR = 0.11-0.51) episodes per child-year . Approximately 68 million pneumonia episodes and 650 000 deaths due to pneumonia were estimated to have occurred in 2016 . There is a notable discrepancy between the incidence of pneumonia in high-income countries, in comparison to low- and middle-income countries (LMICs) . Pneumonia presents a substantial burden on health services and is a major cause of hospital admissions in children . In LMICs, the recognition of pneumonia and care-seeking behaviour is generally poor . An important factor limiting the effective diagnosis and treatment of pneumonia in LMICs is a lower doctor-to-population ratio . Moreover, access to doctors and hospitals is usually more difficult [7,8], and the cost of treatment is often prohibitive for caregivers . Therefore, a significant proportion of pneumonia is diagnosed and treated outside hospitals by non-physician health workers . During household visits or community health centre patient encounters, these health workers apply pragmatic case management algorithms to make decisions on diagnosis, treatment, and referral of children suspected to have pneumonia [11,12]. Community-based management of pneumonia by health workers has had a substantial effect on reducing child mortality .
According to the World Health Organisation (WHO) guidelines, pneumonia diagnosis in children is primarily based on increased respiratory rate (RR). The number of breaths is manually counted for 60 seconds using an acute respiratory illness (ARI) timer or a watch and is then classified as fast or normal breathing according to the child’s respective age group [14,15]. The measurement of RR is challenging, however, and is frequently miscounted, often due to the child’s movement or shallow, irregular breathing. Counting of RR is often not done routinely by health workers as it is difficult, time-consuming, and depends on the availability of timers. Moreover, a clear definition of a breath is not available within WHO guidelines . This has implications for the quality of clinical practice, as it can lead to under-diagnosis, misdiagnosis, and insufficient or inappropriate treatment [17–19].
The diagnosis of pneumonia in LMICs largely depends on health workers’ ability to count RR and classify fast and normal breathing accurately. Despite existing literature evaluating the ability of health workers to count and classify fast breathing pneumonia, to our knowledge the evidence has not yet been systematically collated. As the existing literature involves studies with small numbers, a systematic review would allow more robust evidence to inform clinical practice and policy implementation. In this review, we summarized the evidence on whether health workers can accurately measure RR and identify fast breathing in children under five years of age.
We conducted this systematic review following the methodology described in the Handbook for Diagnostic Test Accuracy (DTA) Reviews of Cochrane . We used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 2020  and the Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA)  in reporting our findings. The review protocol was registered with the PROSPERO database (registration number CRD42020211127).
Population, index test, reference standard, and target condition
The target participants were children under five years of age who had their RR assessed in the community or when attending a health facility. The index test was RR counting and/or fast breathing assessment done manually by non-physician health workers. RR counting and/or fast breathing identification by a human expert or an automated device were considered reference standards. The experts were experienced paediatricians, clinicians, or other persons who were trained in clinical algorithms of pneumonia in children.
We developed a search strategy using a combination of topic-related medical subject headings (MeSH) and keywords. The key concepts were “pneumonia” AND “respiratory rate” AND “accuracy” AND “children under five years of age”. We comprehensively searched MEDLINE (via Ovid), EMBASE (via Ovid), Web of Science, and Scopus databases. The detailed search strategy used for each database is reported in Table S1 in the Online Supplementary Document. Included studies were published between January 1st, 1990, to August 9th, 2020. We sought to identify other potentially relevant studies by subjecting all included studies to a forward citation search and examining their reference lists. There were no restrictions on language in the searches. An expert librarian verified the search strategy.
Studies were included if they met the following criteria:
- Measurement of RR and/or identification of fast breathing were done manually by non-physician health workers.
- A reference standard was used to evaluate the accuracy of RR and/or identifying cases with fast breathing.
- Age of the participants was less than five years.
- Conducted in LMICs. The list of LMICs was obtained from the UN Statistics Division (Table S2 in the Online Supplementary Document) .
Studies were excluded by the following criteria:
- Non-human animal subjects, or mechanically ventilated subjects.
- Information on reference standard was lacking.
- Health workers used a device other than an ARI timer or a watch to measure RR.
- Health workers counted RR from videotaped subjects.
- Disaggregation of data on RR or fast breathing was not possible.
- Disaggregation of data in under-five children was not possible.
Study selection and data extraction
We downloaded the literature search results from different databases into the EndNote X9 reference management software. After excluding duplicates, two review authors (AMK and AOD) independently examined the titles and/or abstracts of the identified studies and excluded irrelevant studies. They then independently analysed the full texts of potentially relevant articles according to the pre-specified eligibility criteria. Disagreements were resolved through a discussion between the two reviewers.
The review authors extracted data from studies using a structured checklist (Table S3 in the Online Supplementary Document) and entered those into the Microsoft Excel spreadsheet. Any disagreements were resolved through discussion.
Both reviewers used the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool  to assess the quality of the included studies. Four domains (i.e., patient selection, index test, reference standard, and flow and timing of the participants) were assessed for risk of bias. There are some core signalling questions under each domain. The answer to each signalling question was “yes”, “no” or “unclear”, and the risk of bias was considered as “low”, “high” or “unclear”. The “unclear” category was used only when insufficient data were reported. Individual domain was considered “low risk” if the answers to all signalling questions were “yes”; “high risk” if at least one answer was “no” in any combination; and “unclear” where at least one answer was “unclear”, the other was “yes” and where no answer was “no” in any combination. Both review authors checked the risk of bias independently and any disagreement was settled through discussion. We entered these data into Review Manager (version 5.3) to create the figure used in this paper.
Data synthesis and analysis
For the studies reporting agreement of RR counts between health workers and the reference standard, we presented the percentage of agreement and calculated median agreement with the range of values. For the studies reporting accuracy of classifying fast and normal breathing compared to a reference standard, we presented sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and kappa value of individual study if data were available, and we calculated median values with ranges.
We performed a meta-analysis with those studies reporting classification of fast and normal breathing where true positive (TP), false positive (FP), false negative (FN), and true negative (TN) data could be retrieved. We estimated sensitivity and specificity with 95% confidence intervals (CI) for each study and presented those in paired forest plots to inspect the study variance. We fitted hierarchical summary receiver operating curve (HSROC) models  using user-written modules (metandi, midas) [26,27] in the Stata statistical software (version 16.0) to assess accuracy of fast breathing identification. Heterogeneity among studies was evaluated visually, from coupled forest plot, and statistically, using the I-square . We used univariate meta-regression to perform subgroup analyses. The parameters for subgroup analysis were as follows: child age, study setting, fast breathing prevalence in the sample, diagnosing health worker, and timing of RR measurement by index test and reference standard.
We performed a sensitivity analysis restricted to studies where fast breathing was defined using WHO RR thresholds. We did not conduct tests for reporting bias due to the ambiguity of the factors of publication bias for diagnostic accuracy studies and the inadequacy of tests for identifying asymmetry of a funnel plot .
Result of the search
The review process is summarised in Figure 1 using the PRISMA flowchart . 17 reports with 16 studies met all the criteria for inclusion in this review [17–19,30–43]. Two reports used the same data set but were both included, as they measured different outcomes [35,36]. Only seven studies reporting accuracy of classifying fast and normal breathing presented TP, FP, TN, and FN data, and those studies were included in the meta-analysis [17,18,31,33,35,42,43]. The list of excluded reports with exclusion reasons is available in Table S4 in the Online Supplementary Document.
Figure 1. PRISMA flow diagram.
Characteristics of the included studies
Table 1 summarises the characteristics of the included studies. Most studies were conducted in Africa [17–19,30,32,33,35–43], two in Asia [30,31] and one in Oceania . 10 studies were based at a health facility [18,30,33,34,37–42], while five were in the community [19,31,32,35,36,43], and one was in a training centre . Studies differed in assessed population, with nine studies assessing children aged 2-59 months [19,32,35,36,38–43], two studies assessed only young infants [31,33], and the remaining studies assessing children varying from 0 to 59 months of age [17,18,30,34,37]. The clinical encounters recorded per study ranged from 34 to 564. The majority of the studies evaluated community-based health workers [17–19,30–32,35–39,43], while three studies evaluated facility-based health workers [40–42] and one study evaluated both . The number of health workers per study ranged from 6 to 154. In most of the studies, the health workers received training before starting the study. The duration of the training ranged from two days to nine months. Only one study used an automated method – the Masimo Root patient monitoring and connectivity platform with ISA CO2 Capnography – to measure RR as the reference standard . The remaining studies used a manual count done by an “expert”.
Table 1. Characteristics of the included studies
RR – Respiratory rate, FB – Fast breathing, CHW – community health worker, bpm – breath per minute, WHO – World Health Organisation
In four studies, health workers and reference standard counted RR simultaneously [17,30,37,43], while there was a short delay (i.e., reference standard measured RR immediately after health worker assessment) in ten studies [18,19,32–36,38,39,41,42] and a long delay (i.e., reference standard measured RR a few hours after health worker assessment) in two studies [31,40]. Studies differed in outcome assessed, with eight studies reporting the percent agreement of RR measurement [17–19,30,36,37,39,43], two reporting Bland-Altman plot to visualise RR agreement [30,43], and 15 reporting correct classification of fast and normal breathing [17–19,30–35,37,38,40–43]. Out of the eight studies reporting agreement in RR, four defined agreement if the difference in RR was within ±2 breaths per minute (bpm) [18,30,36,43], three within ±3 bpm [37,39,43], and four within ±5 bpm [17,19,37,43]. Among the 15 studies reporting accuracy of fast breathing identification, two studies did not use the WHO RR threshold to classify fast breathing [33,34].
Methodological quality of included studies
The assessment of methodological quality is presented in Figure 2. In general, the risk of bias was low or unclear. For patient selection, we evaluated four studies as having a high risk of bias because of non-consecutive or non-random sample selection [17,19,36,40], six studies as having unclear risk of bias because of a poorly described sampling method [18,37–39,43] or exclusion criteria . For the index test, we evaluated all studies as having a low risk of bias because the health workers of all studies were blinded to the result of the reference standard, and a pre-specified threshold was used to classify fast breathing. For the reference standard, we evaluated four studies as having a high risk of bias because two studies did not use the WHO RR threshold to classify fast breathing [33,34]. In two studies, reference standard was unblinded [40,41] and seven studies had unclear risk of bias because of poor reporting on blinding [18,19,31,41,42] and qualification of the experts [17,34–36,40]. For patient flow and timing assessment, we deemed three studies to have a high risk of bias. Among these, a long delay between index test and reference standard was present in two studies [31,40], and one study excluded a certain number of patients from the analysis without proper reporting . Most of the studies had low concerns regarding applicability for all domains. The main concerns were related to inclusion criteria for patient selection in one study  and inappropriate classification of fast breathing for reference standard in two studies [33,34]. Overall, concerns regarding the applicability of the results were low.
Figure 2. Risk of bias and applicability concerns summary: review authors’ judgements about each domain for each included study.
Agreement in respiratory rate count between health workers and reference standard
Table 2 presents the summary findings for the eight studies reporting the agreement in RR count between health workers and reference standards. Definitions of agreement in RR count varied across studies. Table 3 shows that the overall median agreements of the health workers were 39%, 47%, and 67% within ±2 bpm, ±3 bpm, and ±5 bpm of reference standards, respectively. The agreements of RR in terms of age groups, settings, types of health workers, and types of reference standards are also presented.
Table 2. Studies reporting agreement in respiratory rate count between health workers and reference standards
bpm – breath per minute, CHW – community health worker
Table 3. Agreement in respiratory rate count between health workers and reference standards
bpm – breath per minute
The agreement of RR counts between health workers and a reference standard was presented using the Bland Altman plots in two studies. Baker et al.  reported a wide variation in readings, especially in the younger children. The mean difference was -0.6 bpm with limits of agreement (LOAs) from -25.4 to 23.9 bpm . Sinyangwe et al.  reported the mean difference of -0.74 bpm with LOAs from -18.8 to 17.3 bpm. Health workers over-counted RR compared to the reference standard in general, but undercounted in children with higher RR.
Accuracy in fast breathing identification by health workers compared to reference standard
The summary results of the 15 included studies reporting accuracy of classification of fast and normal breathing compared to a reference standard are presented in Table 4. The accuracy of fast breathing identification differed in different age groups. The agreement was comparatively lower in children aged 0-2 months compared to older children. The accuracy of fast breathing identification was lower in children with uncomplicated illness, in comparison to children with severe illness.
Table 4. Studies reporting health worker classification of fast and normal breathing compared to a reference standard
PPV – positive predictive value, NPV – negative predictive value, CI – confidence Interval, SE – standard error
The median sensitivity, specificity, PPV, NPV, accuracy, and kappa value are presented in Table 5. The overall median sensitivity, specificity, and accuracy of classification of fast breathing were 77%, 86%, and 81%, respectively. The median sensitivity was marginally higher in children aged 0-2 months, and median specificity was slightly higher in children aged 2-59 months. The median sensitivity was higher in studies conducted in community settings, whereas the mean specificity was higher in studies conducted in health facilities. Although sensitivities were similar, the specificity was higher in facility-based health workers compared to community-based health workers. The median sensitivity was slightly higher when RR was measured simultaneously by the health worker and reference standard, compared to when it was measured with a short delay. Both median sensitivity and specificity were higher if the prevalence of fast breathing was higher in the sample.
Table 5. Health worker classification of fast and normal breathing compared to a reference standard
PPV – positive predictive value, NPV – negative predictive value
Results of meta-analysis
Individual and summary estimates of sensitivity and specificity with 95% CI for all the studies included in the meta-analysis are presented in Figure 3. The pooled sensitivity was 78% (95% CI = 72-82), the pooled specificity was 86% (95% CI = 78-91), and there was considerable heterogeneity (I2 = 72%). Figure 4 depicts the hierarchical summary receiver curve (HSROC) plot of sensitivity and specificity with summary point, summary estimates, 95% confidence region and 95% prediction region for all studies included in the meta-analysis.
Figure 3. Accuracy of health workers classification of fast and normal breathing compared to a reference standard. Forest plots of individual and summary estimates of sensitivity and specificity.
Figure 4. HSROC plot of sensitivity vs specificity of health worker classification of fast and normal breathing for all included studies.
Table 6 presents subgroup analysis according to child age, study settings, types of health workers, timing of assessment, and prevalence of fast breathing using univariate meta-regression.
Table 6. Subgroup analysis of sensitivity and specificity of health worker classification of fast and normal breathing compared to a reference standard
CI – confidence Interval, RR – respiratory rate
We conducted a sensitivity analysis excluding the study where the WHO RR threshold was not used to classify fast breathing to explore whether this could affect overall results (Figure S1 in the Online Supplementary Document). Based on the studies included in the sensitivity analysis, the pooled sensitivity of fast breathing identification by health workers was 78% (95% CI = 72-83) which was almost similar to the results of the primary meta-analysis (where all studies were included); however, the pooled specificity slightly increased to 87% (95% CI = 81-92).
This systematic review demonstrated that the performance of health workers in the measurement of RR and identification of fast breathing varied across the studies. Overall performance in classifying fast and normal breathing was moderate, with sensitivity ranging from 61% to 88% and a pooled estimate of 78% from the meta-analysis. As the sensitivity is moderate, a significant number of children may have a missed diagnosis of fast breathing, potentially leading to poor outcomes . Some of these children may also have had other clinical signs of respiratory distress, like lower chest wall indrawing, that could have been identified, resulting in a true pneumonia case detection rate higher than these estimates. Further research is needed to investigate possible causes behind the inconsistency in diagnoses between health workers and reference standards, as well as to elicit the difficulties encountered by the health workers, thus improving sensitivity.
The specificity of the studies ranged from 69% to 91%, with a meta-estimate of 86%, demonstrating consistency in exclusion of a diagnosis of fast breathing pneumonia when the disease is not present. This is potentially encouraging, as it may imply that, if these guidelines are followed and RR counting is consistently applied during patient care, then few children would receive antibiotics unnecessarily, which could mitigate inappropriate use of antibiotics . It also means there is minimal unwarranted distress and economic cost for caregivers who would wrongly believe their child has pneumonia .
Although there was a moderate agreement in identifying fast breathing, the agreement in RR count between health workers and reference standards was relatively poor. The level of agreement was inconsistent across the studies. The median agreements were 39%, 47%, and 67% within ±2 bpm, ±3 bpm, and ±5 bpm, respectively. It is worth mentioning that obtaining good agreement on RR counts is challenging, even between experts . The difference in RR counts between two observers often does not change the diagnosis. Therefore, classification of RR into fast and normal breathing would be better than the continuous RR count agreements to evaluate the performance of health workers considering its clinical relevance.
The review found that the agreement in RR count between health workers was poor in children aged 0-2 months compared to the older children. The health workers may find it easier to count RR when it is slower in older children compared to when it is fast in younger children . Interestingly, the review found that, although the specificity of fast breathing identification was higher in children aged 2-59 months, the sensitivity was higher in children aged 0-2 months. However, this finding for identifying fast breathing in newborns was based on two studies only. Sensitivity was also found to be slightly higher in infants compared to older children. More studies evaluating the accuracy of RR measurement and fast breathing identification in newborns and infants would be required to confirm this.
Community-based health workers performed better at counting RR and identifying fast breathing compared to facility-based workers. This might be due to community-based workers are usually recruited and trained for a specific program. They usually assess similar signs and symptoms repeatedly, give more time to do an assessment, develop better skills in assessing those specific signs and symptoms, and thus become more experienced, despite being lower cadres . On the other hand, facility-based workers must deal with different types of patients with a wide range of signs and symptoms. The sensitivity was higher in the studies conducted in the community settings compared to those in facility settings. The crowded and busy environment of the health facilities in LMICs might influence the performance of the health workers .
The interval between health worker assessment and reference standard assessment is also important in evaluating the performance of health workers. The review demonstrates marginally higher sensitivity when both assessments were done simultaneously compared to a short or long delay. The RR can change over a period of time and this variability may affect sensitivity and specificity in identifying fast breathing . Therefore, simultaneous measurement of RR by a health worker and a reference standard should be ideal. A short delay is not a valid reference standard for comparing RR but may be fair for comparing a binary pneumonia diagnosis. A prolonged period between the two measurements should be avoided.
The absence of an appropriate reference standard to evaluate the performance of health workers is a challenge. Most of the included studies used manual RR count by an expert as the reference standard. An expert is assumed to be more correct. However, the expert can over-count or under-count breaths. Therefore, using expert counting as a reference standard itself poses challenges due to uncertain accuracy. The possible biases using human expert count as the reference standard includes the difficulty in measuring the RR over the same simultaneous period and inconsistencies in human expert RR counting. One study used capnography as reference, which is an automated method using carbon dioxide (CO2) in exhaled air to extract RR . However, the validity of using capnography in measuring RR in field-setting is yet to be established. The videography of child assessment and interpretation of the videos by an expert panel could be recommended as a reference standard for future studies [50,51].
There were several limitations to this review. First, most of the studies included in this review were conducted in Africa, while only two were conducted in Asia, and one in Oceania. Therefore, the review findings might not be generalizable across LMICs. Second, RR was often measured by health workers as a part of a larger study. The study may not have provided sufficient information about the methods of measurement and comprehensive results. Third, in most studies, a varying level of training was provided to the health worker before their assessment. This could impact the results of this review . This also raises the question of whether the results of these studies assess health workers’ performance in their day-to-day environments instead of their competency after training. Performance of health workers during the study might not accurately reflect their day-to-day performance; it may also decay over time from training. Fourthly, most of the studies used an expert person as the reference standard who observed the assessment performed by the health workers. The performance of health workers might increase due to the observation compared to when conducting their usual day-to-day activities. This means that the findings would reflect a best-case scenario of accuracy and in the real-world context, we might expect it to be even worse . Fifth, different studies used different definitions of RR agreement, ranging from two to five bpm between health workers and the reference standards. Therefore, it was not possible to combine the findings of all studies that reported agreement of RR measurement. Sixth, we have discussed some factors responsible for the variability of performance of health workers across the studies. There could be quite a few more contributing factors. Finally, we could not include some studies in this review that assessed health workers’ performance in this review, including diagnosis and management of pneumonia, which would involve measuring RR and classifying fast breathing. It was unclear whether these outcomes were measured or measured and not reported. Moreover, we could not include some studies in the meta-analysis because TP, FP, FN, and TN data were missing in the reports, or these were not possible to retrieve.
Despite these limitations, this review provides evidence on the need of strengthening the performance of health workers to measure RR and identify fast breathing pneumonia. Counting RR is the cornerstone to the diagnosis of pneumonia in children, but it is rarely practised in the field during real-world care . The performance of health workers could be enhanced by improved training, supportive supervision, ongoing performance monitoring, and feedback . Counting RR manually is challenging often results in inaccurate diagnosis. Therefore, the development of improved pneumonia diagnostic aids, such as a validated automated RR counters appropriate for use by health workers, might improve the diagnosis of pneumonia in LMICs . Appropriate methods including a non-biased reference standard should be used to evaluate the accuracy of health workers’ RR counts. Further implementation research could help define what the best approach for improving their performance.
This review showed that the accuracy of RR measurement by non-physician health workers varied across the studies. While they could measure RR and identify fast breathing pneumonia with a moderate sensitivity and reasonable specificity, there is still a need for the improvement of RR measurement and identification of fast-breathing pneumonia by these health workers. This could be done through improved training, ongoing supervision, audit of performance, and improved diagnostic aids to measure RR and classify fast breathing accurately. The contribution of well-trained and well-equipped health workers is valuable in LMICs, where it is not always feasible for a child to see a doctor. This should decrease the burden on already scarce doctors and health centres in LMICs and may help reduce morbidity and mortality associated with pneumonia.
We are grateful to Ruth Jenkins, Academic Librarian at the University of Edinburgh, for her help in developing the search strategy.