Introduction

Cardiotocograph (CTG) is a screening tool which aims to detect fetal heart responses to an ongoing intrapartum hypoxic or mechanical stress of labour. Nowadays, it is well known that the ability of CTG-monitoring to accurately detect intrapartum hypoxic-stress has been questioned due to its high false-positive rate and the lack of a valid gold-standard for intrapartum-fetal-hypoxia detection to compare with. The recent Cochrane-Systematic-Review on intrapartum-fetal-monitoring1 concluded that the use of CTG-monitoring increased the rate of C-sections and instrumental deliveries without a significant reduction in the rates of perinatal death or cerebral palsy. Several confidential enquiries into poor perinatal outcomes have highlighted that CTG-misinterpretation is still one of the key avoidable issues. Between 2000 and 2010, the National Health System (NHS) Litigation Authority2 in the UK identified 300 claims involving CTG-misinterpretation, with an estimated value of £466 million. It is estimated that in the UK, between 500-to-800 babies die or are left with severe brain injuries every year and CTG-misinterpretation has been found to be a contributing factor in 49% of all the cases reported3. Misinterpretation of CTGs is mainly subject to two main components, one is the clinical interpretation by the practitioner, and the other, the historical lack of clear consensus by different international and national guidelines. Recently, the use of confusing guidelines on CTG-interpretation based on ‘pattern-recognition’ have been identified as a source of variability affecting intra and inter-observer agreement4. Also, human element is other strong source of variability, as even ‘CTG-Experts’ have been shown to change their opinion, once they are made aware of the neonatal outcomes5. Furthermore, there is still no reliable technology able to alleviate this issue6. The only aspect that seems to have had a positive impact in improving inter and intra-observer agreement of CTG-interpretation is intense training and education7–9. Nonetheless, there are still controversies about the standardisation and efficacy in the current training schemes offered worldwide10,11. Therefore, improving training in CTG-interpretation seems to be crucial to improve perinatal outcomes. Our hypothesis suggests that intense fetal-physiology-based training contribute positively to enhance the inter- and intra-observer agreement as well as levels of self-confidence and knowledge. Although some authors12, ‘appealing to the stone’, venture to refute the fetal-physiology-approach in favour to pattern-recognition-approach, it is evident that guidelines based on pattern recognition are contributing to poor perinatal outcomes and increased intrapartum operative interventions4. The latest Each Baby Counts Report, published by the Royal College of Obstetricians and Gynaecologists13, highlights that 33% of cases were due to CTG-Misinterpretation, and in 72% different care may have resulted in different outcomes. In contrast, an intense-training on the use of fetal-physiology to interpret CTG-traces have been reported to be associated with improved perinatal outcomes9. The objective of this study was to address the level of agreement, the sources of discrepancies and the associated human factors on CTG-interpretation in staff trained in fetal-physiology-approach from the maternity unit at St George’s University Hospitals NHS Foundation Trust in London in order to obtain a deep insight about this in vogue method. This hospital, which is one of the largest Teaching Hospitals in London with approximately 5000 births/year, was the first centre in The UK to introduce a mandatory competency testing for all staff providing intrapartum care on CTG-interpretation in 2010 after implemented an intense training in CTG-monitoring based on fetal-physiology-approach, which is provided by a team of highly experienced obstetricians and midwives (CTG-Team in the document). This dedicated CTG-Team has received national awards for its outstanding performance in ensuring a low intrapartum C-section rate and a low hypoxic ischaemic-encephalopathy as compared to other Tertiary Teaching Hospitals in London.

Methods

A total of 25 midwives and 7 doctors, approximately 10% of the total clinical staff, were asked to interpret five anonymised colour-printed-copies of five different CTGs [Fig.1]. Traces were accompanied by the relevant clinical history. Three traces corresponded to ultrasound-transducer recordings, and the other two were CTG-STAN recordings. Along with each copy, a questionnaire with closed and open questions was also provided (supplemental material). The five CTGs were deliberately selected based on the features that give rise to differences in their interpretation. The same questionnaire was also previously filled by the Hospital CTG-Team and was used as theoretical gold-standard for analytical purposes. The questionnaire responses included the categorisation of the CTG-traces, as well as the identification of any ongoing type of hypoxia. In order to classify the traces, local CTG-guidelines (NICE or STAN) were used. Detection of the types of fetal hypoxia on the questionnaire was based on the described criteria in the scientific literature14,15: gradually evolving hypoxia, subacute hypoxia, acute hypoxia, and chronic hypoxia. The questionnaire also allowed the quantification of several aspects: (1) the proportion of concordance, between the CTG-Team and clinical staff, in CTG-classification by ‘CTG-guidelines’ as well as by identification of ‘types-of-hypoxia’, (2) the inter-rater (inter-observer) reliability within the staff, (3) the background knowledge in CTG-interpretation and (4) the level of self-reported confidence.

Statistical analysis

The CTG categorisations provided by the CTG-Team compared with the categorisation given by the staff was assessed by proportion of concordance (PC) with 95% confidence interval (CI). The staff inter-observer reliability was assessed by Fleiss-Kappa value (K). K-values were interpreted according to Landis and Koch16 recommendations: a K<0.20 was considered poor, 0.20-0.40 slight, 0.41-0.60 fair, 0.61-0.80 substantial, and 0.81-1.00 almost perfect. The rest of the proportions that were mainly descriptive were expressed as raw percentages without CI. Comparison of different PC was assessed by chi-square test with a significant level set a P<0.001. Comparison of K-values was assessed following Cumming and Finch17 where K were considered non-significantly different if the 95% CI overlaps. The statistical analysis was generated using the Real-Statistics Resource-Pack software (Release-4.3) for Excel-Microsoft-Office 2015 and IBM SPSS-Statistics for Windows, Version-25.0. Armonk, NY:IBM Corp.2017.

Ethical approval

Data was obtained as part of a university MSc-program and therefore followed the ethical guidelines of the UK universities in addition of the permission of the Hospital Local Ethics Committee and the voluntary participation of the staff. No patient identifiable data were used in the study.

Results

CTG interpretation: Categorisation and types of hypoxia.

In total, 160 CTG full interpretations, five for each participant were examined. The analysis of the differences between the CTG-Team and the clinical staff on CTG-interpretation applying local CTG-guidelines are displayed in table-1. Overall, the categorisation of CTG using the correspondent local-guideline presented a PC (95% CI) = 61.2%(53.6%–68.8%), representing a moderate agreement against the CTG-Team and a K (95% CI) = 0.33 (0.316–0.362), representing a fair reliability. However, if the CTGs are being interpreted by types of hypoxia, the PC= 76.1% (69.4%–82.8%) and K=0.37(0.35–0.39). Consequently, the identification of type hypoxia compared against local-guidelines as method of CTG interpretation presented better PC (76.1% vs 61.2%, P=0.006) and slightly better reliability (K 0.37 (0.35–0.39) vs 0.33 (0.32–0.36)). In comparison with other methods of interpretation based in pattern-recognition and published under peer-review; interpretations by types-of-hypoxia present the higher proportion of agreement, and also, better reliability than studies with similar sample of observers [Table-2]

Background knowledge

The staff were asked to rank from 1-5 which source of knowledge helped them most in analysing each CTG. The options given were: 1) uses of current guidelines, 2) own knowledge in fetal-physiology, 3) previous experience, 4) opinion of someone more senior and 5) similar case(s) previously discussed during a CTG meeting/training. Midwives reported that the background knowledge on which they rely the most are guidelines first (25.8%), and fetal-physiology second (22.2%) followed by experience (20.4%), discussion in previous CTG-meetings (16.3%) and opinion (15.2%). Doctors relied mostly on fetal-physiology (28.9%), experience (20.5%) and meeting (20.5%) were ranked both in second position with same percentage followed by guidelines (19.3%), and opinion (10.7%) [Table-3; Fig.2-4]

Self-reported level of confidence

The staff were asked to rank the level of confidence over 7 possible points from ‘not confident at all’ to ‘very confident’. Overall, 68% of them feel confident or very confident with CTG interpretation. Within the midwifery group, the most confident or very confident were Band-7 midwives (94.7%) followed by Band-6 (64.5%) and Band-5 (41.7%). Doctors followed a similar pattern to midwives. The most confident to very confident were the Consultants (100%), followed by senior doctors (90%) and junior doctors (57.1%). [Table-4;Fig:5]

Discussion

Since the purported rationale of having different categorisation of a CTG-trace is to identify the risk of the potential hypoxia, our study shows that it is more practical to directly state whether a fetus is exposed to a hypoxic stress and the type of ongoing fetal hypoxia, if any. This may help avoid the use of confusing terminology such as ‘intermediate’ ‘suspicious’ or ‘pathological’ CTG-traces, which have no correlation with neonatal outcomes18. Also, it is worth to mention that our method to calculate PC imply a double agreement: first between staff and second against the gold-standard. Therefore, we suggest this method enhance the validity of our agreement results.

Sources of discrepancy: Pattern-recognition vs. Fetal-Physiology

The staff that did not agree with the diagnosis of the CTG-1and described it as suspicious or pathological were led by the number of uterine contractions shown in the tocograph and not by non-reassuring features on the cardiograph. This suggests that features that are not formally part of the CTG-guideline table may interfere with the overall interpretation. The intense fetal-physiology training ensures that the trained staff is also able to consider any ongoing excessive uterine activity contributing to abnormal features on the CTG-trace. Although being vigilant for any deviation from normality is crucial in maternity services, clinicians should also bear in mind that an over-diagnosis may be equally harmful, as it may lead to expediting the delivery of a healthy fetus. An interesting data for discrepancy was noted in CTG-2, where up to 10 different nomenclatures were used to describe decelerations. Although, none of those categories and nomenclature would lead to different management other than imminent delivery, the use of appropriate terminology stipulated by the guidelines was not followed. This reflects the inherent flaws in any guideline which is based on ‘pattern-recognition’ which relies on the morphological classification of decelerations, as this would lead to significant inter and intra-observer variability. According to the CTG-Team, the CTG-3 baseline is 108bpm, was a non-reassuring feature as stipulated by the guidelines and thus, the CTG must be categorised as suspicious. However, the staff who categorised the CTG as normal did so because they considered that the base line was ≥110bpm. The problem that arises from this 2bpm difference is that a base line of 108bpm in a term baby can be perfectly normal, but it can be (strictly speaking) categorised as a non-reassuring feature. Consequently, if any other non-reassuring feature appears while the baseline is defined as suspicious, the CTG would be categorised as pathological. A similar scenario was seen on the CTG-4 as the main discrepancy was categorising the trace as intermediary (under STAN-guidelines) due to a base line of >150bpm. Understanding the importance of accurately (and physiologically) interpreting baseline is crucial to avoid over diagnosis leading to potential unnecessary interventions because incorrect assessment would lead to incorrect management.CTG-5 only presented one complicated-deceleration with a reassuring baseline. However, in the context of STAN-guidelines, which differentiated between different types of decelerations, but do not specify the number of decelerations required per determined period of time, promotes confusion in the CTG-categorisation. Similar to CTG-2, the confusion arise from naming the decelerations or mixing the guidelines producing a confusion that can reduce the rate of agreement only on the basis of the terminology. This highlights the role played by some guidelines based on ‘pattern-recognition’ in promoting confusion amongst clinicians.

Source of knowledge

When midwives progress from Band-5 to Band-6, logically, they start relying more in their own experience and less on the opinion of someone more senior. Most importantly, the data show that the more senior the midwife, the more reliance on the fetal-physiology to interpret the CTG-traces and a diminished reliance on the CTG-guideline until they become Band-7. This last group reported experience as the least valued option and their decisions are mostly based on the use of guidelines followed the by knowledge of fetal-physiology. A possible explanation to this phenomenon amongst Band-7 (labour co-ordinators) may be their crucial role in having an ‘overall’ responsibility which could create conflicts between taking defensive decisions following a closed written-guideline or trusting the fetal-physiology. Similar scenario as Band-7 is seen on senior doctors, but not in Consultants. However, since doctors also increase reliance on the understating of the fetal-physiology along seniority, it is likely that the CTG intense-teaching is promoting a switch from pattern-recognition to a physiological-approach amongst staff, this can be easily visualised in the radial graphs provided [Fig. 2-4].

Confidence on CTG-interpretation

The level of confidence varies according to professional grade. Both, midwives and doctors gain self-confidence as they progress in their respective careers. Band-7 midwives (i.e. labour ward co-ordinators) reported a higher level of confidence than junior and senior doctors. This is likely due to the intense ‘cascade training’ on CTG-interpretation provided to Band-7 midwives by the CTG-Team to ensure that the unit is always staffed by a co-ordinator with an excellent knowledge of fetal-physiology. The lower proportions of being confident or very confident are among Band-5 midwives (i.e. newly appointed or junior midwives). This is understandable, considering that they are the professionals who are most likely to seek a senior opinion. In contrast, 100% of consultants felt confident or very confident. However, it is also interesting to highlight a considerable disagreement in CTG-interpretation between the consultants who took part in this study. This is likely due to the incorporation of individual ‘experience’, disregarding the guidelines or fetal-physiology by some consultants. Therefore, it is important to appreciate that some degree of overconfidence and/or non-concordance may exist amongst senior clinicians in any maternity team due to their experience. Therefore, a multidisciplinary-team approach to CTG-interpretation by improving the knowledge of fetal-physiology may help improve concordance and reliability in CTG-interpretation.

Importance of Fetal-Physiology training and multi-professional approach

Our study highlights the challenges that arise when pattern-recognition is in place. On one hand, relying mostly on CTG-guidelines, especially in junior staff, could be seen as a “horse-blinder” producing inability to see and understand a wider clinical picture such as an appropriate fetal-heart-rate base-line, ongoing chorioamnionitis, maternal pyrexia, meconium stained liquor, etc. This is usually manifested by lower level of self-confidence in CTG-interpretation. On the other hand, in the more senior staff, there is a chance of taking a more defensive and interventionist approach by relying more in CTG-guidelines and ‘personal experience’ than in the actual physiology and clinical picture. This could be manifested by an overconfidence status. Therefore, it is vital to ensure all staff receive intense training on fetal-physiology and the types of intrapartum-hypoxia, so the pattern-recognition approach do not trump physiological and scientific principles underpinning intrapartum fetal heart rate monitoring18. To support the above, our study demonstrates that intense training on fetal-physiology not only improve K and PC but also increase knowledge and self-confidence in CTG-interpretation. This will contribute to reduce the variation in the management of labour, and hopefully, will improve intrapartum maternal and perinatal outcomes. Additionally, instead of using multiple CTG-guidelines based on pattern-recognition with confusing terminologies and different ‘features’, we suggest the use of ‘types of intrapartum-hypoxia’ to classify CTG-traces as a default method. This will contribute to delineate better the fetal ability to respond and compensate to an hypoxic insult, which is the corner-stone of intrapartum-CTG. Similar findings were reported in a recent study19 which analysed 52,187 births over an 11-years period, which reported 81% agreement between clinicians when ‘types of hypoxia’ were used to classify the CTG-Trace, instead of using guidelines based on ‘pattern-recognition’.

Strengths and limitations

To our best knowledge, this is the first study which analysed inter-observer variability amongst 32 midwives and obstetricians of different grades and experience who have undergone an intense training on fetal-physiology. Secondly, we had a dedicated CTG-Team, who have expertise on CTG-interpretation as they have published extensively in this area, and conduct CTG-Masterclasses in approximately 14 countries every year, who were used as the ‘gold-standard’. Thirdly, in addition to inter-observer variability, we also analysed subjective levels of confidence on CTG-interpretation. The main limitation was the restriction to a single centre. However, the authors felt that it was best to conduct this study in a centre where had received an intense training on fetal-physiology, and a mandatory competency testing on CTG-interpretation. Secondly, it may be argued that clinicians were provided with only 20-minutes of the CTG-trace instead of the whole trace, we accept that ‘Cycling’8,20, one the most important CTG-features, could not be evaluated properly. However, to determine the type of hypoxia, it was felt that a 20-minute trace was sufficient and it reflected the real life situation, where clinicians are expected to make crucial decisions based on short segments of the CTG-trace. Thirdly, the authors accept some may argue that number of observer was small. However, this was a complex study assessing inter-observer variability, and although only 32 clinicians took part in the study, a total 160 CTG-traces were analysed. Many studies on inter-observer variability on CTG traces have used less than 10 clinicians21–29.

Conclusions

This paper demonstrates that continuous education and an intense ‘fetal physiology-based’ CTG-training by a specialised CTG-team increase knowledge in fetal-physiology and produces higher levels of staff confidence reflecting better levels of agreement and reliability. Classification of CTG-Traces by ‘type of intrapartum-hypoxia’ is preferable to CTG-guidelines. However, if CTG-guidelines are sine-qua-non element of the maternity unit, they should be simple and easy to use, and these should be backed up by immediate availability of senior input with appropriate knowledge of fetal physiology able to recognise any ongoing hypoxic process. This approach may help reduce the pitfalls of pattern recognition amongst more junior members of staff. Development of a specialized “CTG-Team“ formed by consultants and midwives to educate staff, and to review and discuss CTG-traces and outcomes may help to create a multidisciplinary approach resulting in inter-observer variability reduction and increased staff confidence in CTG interpretation.

Conflict of interest:

No conflict of interest has been declared by the authors.

Contribution to authorship:

JG conceived the study and performed data collection. JG, DD and EC undertook data analysis and interpreted results. All authors contributed to the writing of the document and approved the final manuscript.

Funding:

JG: Main data was collected as part of a Self-funded university MSc program. A secondary analysis of the data was performed to elaborate this manuscript. There is no source of funding to declare by the rest of authors in this paper.

Details of ethical approval:

Data was obtained as part of a St. Georges University of London MSc-program (LA: HP7203/4/XY approved and signed on 22.06.2017) in addition of the permission of the Local NHS trust and the voluntary participation of the staff. No patient identifiable data were used in the study.

Acknowledgments

We would like to thank Susan Heatley, Senior Lecture at St. Georges University of London, Lindsay Gillman, Senior Lecture at St. Georges University of London, Margaret Flynn, Deputy Head of Midwifery at St. Georges University Hospitals NHS, the members of the ‘CTG team’: Mrs Virginia Whelehan, Miss Abigail Archer, Mrs Roise Heffernan, Miss Rachel Tree, Miss Isabelle Cornet, Mr Austin Ugwumadu, as well as the multidisciplinary Maternity Team at St George’s University Hospitals NHS Foundation Trust, London.