We conducted a retrospective study using the EDR data of patients who received at least one comprehensive oral evaluation (COE) in the IUSD predoctoral clinics. We developed a rule-based computational program to diagnose patients’ gingivitis statuses from periodontal charting findings (hereby referred to as BOP%-generated diagnoses), which are recorded as discrete data. An NLP was created to retrieve diagnoses recorded by clinicians in the clinical notes as free text (hereby referred to as clinician-recorded diagnoses). Next, we evaluated these programs’ performances and assessed the agreement between BOP%-generated diagnoses and clinician-recorded diagnoses. Finally, we determined the reasons for the agreement and disagreement between the BOP%-generated diagnoses and the clinician-recorded diagnoses (see Fig. 1). The study protocol was reviewed and approved by the Indiana University Institutional Review Board (#:1909819686, September 2019). Indiana University Institutional Review Board has waived the need for informed consent due to the retrospective nature of the study. Moreover, all methods were performed in accordance with the relevant guidelines and regulations presented by the Scientific Reports journal.
EDR data (axiUm®, Exan software, Las Vegas, Nevada, USA) consisting of periodontal examination findings and charting of patients 18 years or older who received COE between January 1, 2009, and December 31, 2014, during the time of their first completed COE. We excluded those patients who came to the IUSD emergency clinics did not have a complete COE, who did not have any records (for example, paper or electronic records), and patients who did not have teeth (edentulous). We included 28 teeth (central incisors, lateral incisors, canines, first and second premolars, and first and second molars) and excluded third molar information from periodontal charting. This study time was used because we aimed to compare gingivitis diagnoses using subjective and objective criteria (prior to the release of 2018 criteria) versus only objective criteria. This paper only focuses on gingivitis, however, we also retrieved periodontal disease diagnoses18 using the same NLP approach to exclude patients who have gingivitis and/or periodontitis diagnoses recorded in the clinical notes. At IUSD, all patients receive a full-mouth hard and soft tissue examination during a COE. Patients’ soft tissue assessment included full-mouth visual assessment, periodontal charting, diagnoses, treatment planning, and prognosis assessment. Periodontal charting is recorded as structured data, while all other findings, including diagnoses, are recorded as free text in the clinical note referred to as the periodontal evaluation form (PEF).
Until 2018, student clinicians and their supervising faculty members diagnosed gingivitis status using visual assessments such as a change in gingival color, volume, stippling, and objective assessment of BOP%5,19. Using these criteria, they diagnosed patients as (1) localized or generalized gingivitis cases, and (2) severity status (mild, mild to moderate, moderate, moderate to severe, and severe). Since 2013, the Department of Periodontology at IUSD conducts regular calibration sessions of clinical faculty who supervise student clinicians20,21. During a COE, student clinicians record patients’ periodontal charting findings and diagnoses, which are reviewed, discussed and updated with a calibrated supervising faculty.
Rule-based computational program to determine patients’ gingivitis from BOP%
We developed acomputational program (Gingivitis_Diagnoser.py see Python program as text file in supplementary material S1) that classified patients’ gingivitis status into healthy, localized, or generalized gingivitis cases based on 2018 BOP% and intact periodontium (see SM Table S1). Gingivitis_Diagnoser.py consisted of various Python inbuilt functions, variables, and rules such as readlines (), list (), map (), replace (), len (), search (), and regular expressions (). First, the program created two temporary variables named “Total_teeth” and “Total_Sites” that determined the total number of teeth and number of sites present in a patient’s dentition respectively. For example, if the patient has 28 teeth present, then the total sites will be (28*6) 168 total sites which are stored in “Total_Sites” variable. Next, the program finds the total number of positive BOP sites in the entire dentition and stores this information in a temporary variable named “Total_bleeding_sites”. Finally, the program calculates the proportion of BOP sites (“total_bleeding_sites”/” total_Sites”). This program was applied to the entire dataset of 28, 908 patient records that contained a total of 17,104,308 observations consisting of clinical attachment loss, BOP, mobility, furcation involvement, tooth number and the six sites.
In summary, the program generates a total number of BOP sites by counting tooth sites with BOP findings recorded in the periodontal chart. The denominator (total number of sites “total_Sites”) is calculated by counting the total number of teeth and multiplying them by six because each tooth has six potential BOP sites. The program also estimates a tooth to be present if six probing depth sites per tooth were present in the charting (mesiobuccal, buccal, distobuccal, distolingual, lingual, and mesiolingual). Thus, if a patient has 48 BOP sites and 28 teeth with six probing depth sites, then this patient’s BOP score is 29% (48/28*6). Based on the BOP% criterion described in the 2018 classification 5, the patient is diagnosed to have a localized gingivitis case.
NLP program to retrieve gingivitis and periodontal disease diagnoses from clinical notes
Clinicians typically document PD diagnoses in a textbox, in free text format, in the EDR. To retrieve diagnoses recorded as free text, we developed an NLP program (“Periodontal Disease Diagnoses Extractor.py” [see Python program as text file in supplementary materials S2] to retrieve patients’ PD diagnoses automatically in an analyzable structured format. First, to understand the diagnostic terms recorded by the clinicians, we reviewed 125 PEFs and annotated words and text phrases that indicated gingivitis or periodontitis diagnoses (see Table 1). Clinicians typically write PD type (gingivitis or periodontitis), severity (mild, mild to moderate, moderate, moderate to severe, and severe), location (maxilla, mandible, tooth number), and extent (localized or generalized). Typically, gingivitis and periodontitis diagnoses are recorded in the same field in the PEFs, therefore, our NLP program automatically extracts information for both gingivitis and periodontitis from the PEFs. The data set included 22,990 PEFs with a total word count of 145,173 containing diagnoses information. The average count for word per sentence was 7.20.
To retrieve a patients’ detailed PD information based on the disease type, severity, extension, location, and region, we used the approximate string-matching algorithm (ASM) from the Natural Language Toolkit (NLTK) library22. We also used the NLTK library to tokenize sentences and convert all words into lower cases. Tokenization is the process of splitting a text corpus into smaller segments that act as the first level of tokens. This process is also known as sentence segmentation because it attempts to segment the text into meaningful sentences or span of text. Then the text is further preprocessed by removing special characters and specific delimiters between sentences such as periods (.), newline characters (\n), and semi-colons (;). The ASM helps to find similar text or a directory even though spelling or grammatical errors are present in the text. The approximate string search is formulated to find the text or dictionary of size “N” of all the words that start or match with the given word while considering all the possible “K” differential errors. This algorithm works on the “Levenshtein distance” concept. The “Levenshtein distance” is a metric to measure the distance between two sequences of words. Typically, in this logic, a user is asked to set a percentage of the match per requirement. For example, if the word “periodontitis” is written with spelling errors such as “eriodontitis” or “poridontitis” the ASM logic can detect these variations present in the clinical text and identify them successfully. Similarly, the ASM algorithm was utilized to automatically extract a patient’s disease type, severity, location, and extent information.
Performance of computational and NLP programs
Two faculty members (authors LW and DS (periodontist)) from the predoctoral comprehensive care clinic (hereby referred to as experts) calculated the BOP% using the 2018 gingivitis classification criterion5 (see SM Table S1). Each expert reviewer compared the manually calculated BOP% with the program’s output. After obtaining an excellent inter-rater agreement (Cohen’s Kappa statistic  = 1) for 50 patient cases, the two experts diagnosed gingivitis independently for 200 patient cases, which led to the development of a reference standard of 250 patient cases. The BOP%-generated diagnoses were also calculated using the program and compared with the reference standard. Precision, recall, and F-1 measure23 metrics were also calculated to assess the program’s performance (see SM Table S2).
The two experts also examined the diagnoses entered as free text in the PEF and the output generated by the NLP program. Using guidelines (see SM Tables S1–S5), they first reviewed 50 records and obtained excellent inter-rater agreement (Cohen’s Kappa = 1)24. Subsequently, each expert reviewed 150 records that resulted in 350 cases and constituted the reference standard. While evaluating NLP output with the free text in the PEF, the experts categorized each patient’s diagnoses into true positive, true negative, false positive, or false negative. A detailed description of the manual review process with examples are provided in the supplementary materials Sect. 2 and SM Tables S3–S7. Next, we compared the NLP program results and the experts’ assessment and calculated precision, recall, and F-1 score23 for the program.
Final data set
After applying the rule-based and NLP programs, we obtained patients’ gingivitis diagnoses from the charting and patients’ clinician-recorded gingivitis and periodontitis diagnoses from the clinical notes. Next, we excluded the EDR of patients who had periodontitis or both gingivitis and periodontitis recorded in the clinical notes. The final data set included EDR data that had clinician-recorded diagnoses of gingivitis, including healthy gingiva and a clinical attachment loss of less than 4 mm indicating intact periodontium (see Fig. 2); in addition, it included only EDR data of patients who had periodontal charting clinician-recorded diagnoses from the same visit date.
Agreement between BOP%-generated diagnoses and clinician-recorded diagnoses
According to Weiskopf et al.25, data is considered concordant when there is an agreement of compatibility between the same information present in two data fields. We calculated the percent agreement between BOP%-generated gingivitis diagnoses and clinician-recorded diagnoses. The agreement was determined for a patient’s disease status (healthy vs. gingivitis) and disease extent (localized vs. generalized).
The two experts reviewed 145 randomly selected patient records to determine the reasons for agreement and disagreement between BOP%-generated and clinician-recorded diagnoses. Out of 145 records, 50 were reviewed by both experts to ensure high inter-rater reliability (Cohen’s Kappa score of 0.9 (excellent rating)). Through discussion and consensus, the experts determined the reasons for agreement/disagreement between the two diagnoses. They also reviewed notes recorded in the PEF to ensure the clinician-recorded diagnoses was accurate and was determined to have inter-rater reliability of 0.86 (Cohen’s Kappa statistic for inter-rater agreement). This high agreement provided excellent accuracy of the clinician-recorded diagnoses. Therefore, we considered this manually evaluated dataset as a gold-standard dataset because the patient is extensively examined by the student, periodontal resident, faculty member, and the two experts (total five clinicians). The clinical note recorded in the PEF provided comprehensive information about patients’ gingival health including the bleeding on probing values recorded during periodontal charting.
Descriptive statistics were calculated with a 95% confidential interval for demographics variables such as age, gender, race, and insurance status by clinician-recorded diagnoses and BOP%-generated diagnoses. Age was stratified by 18–29, 30–44, 45–64, and 65 years and greater as reported in the national prevalence study by Eke et al.26 Patients’ gender, race, and insurance status were classified as displayed in Table 2. Percent agreement was calculated to determine the agreement between clinician-recorded diagnoses and the BOP%-generated diagnoses.
This project was approved by the Indiana University Institutional Review Board (#1909819686) as Exempt research.
Contribution to the field statement
This study demonstrated that a significant percent of people with BOP%<10% on an intact periodontium could also have visual signs of gingival inflammation such as redness and swelling, which is considered periodontally healthy according to the 2018 diagnostic classification. Given that visual signs have been a significant part of diagnosing gingivitis until 2018, periodic training and calibration exercises are necessary for clinicians and researchers to transition them to using the new classification consistently in their practice.