To model the heart

The nomenclature…

Currently, the EACTS Congenital Database uses so-called nomenclature. It’s a list of all the diagnoses and all the surgical procedures that the database recognizes. In other words, it’s the database ontology.

…and its flaws

Talking with doctors about my analysis, every now and then, I hear a sentence which begins with: “But you don’t know that…”, followed by some additional facts that are not contained in the database. Why is that? Having the data, I know everything that doctors wanted me to know, so I don’t need to be told that I’m missing something. If one can’t make an analysis based on the data alone, the database doesn’t do it’s job. The job of the database is to provide all the information needed to do the analysis.

I already have it in for the Aristotle Score. Now I’m targeting the nomenclature.

The list was probably created by surgeons, who don’t have knowledge about data structures and mathematical modeling. So they created a list of procedures, with a possibility to assign many procedures to one operation. Unfortunately, they didn’t know that they have had must keep one procedure per entry. For example, they created two procedures:

  • VSD repair
  • Arterial Switch Operation (ASO)

That’s fine. If a patient had one of them, you assign one of them. If a patient had both, you assign both. But then, suddenly…

  • Arterial Switch Operation (ASO) and VSD repair

Why? I can already assign both VSD repair and ASO to a patient. Why would I need a third entry, containing both of them? This violates the integrity of the data.

Let’s say, I ask the system: “Please tell me about patients who had VSD repair, what else did they have?”. Being asked such question, the system will look at the “VSD repair” procedure, and show other procedures attached to those patients. Since “ASO + VSD” is a different procedure, it will not be shown. This is a data model integrity violation.

Clearly, making my analysis, I need to repair those flaws, by removing “ASO+VSD” procedure and replacing it with two procedures: “ASO” and “VSD”.

New ontology

The database is bound to the given ontology. When given a list of procedures, it can talk using the procedures’ names.

I learned that the original list of diagnoses had thousands of them. It was shortened down to 200 entries. So there potentially are thousands of distinguishable possibilities.

A heart is an object which consists of parts. Each part can suffer a number of defects. It should be possible to describe a diagnosis by specifying one or more defects in specific heart parts. This would be a creation of a new ontology: instead of a plain list of diagnoses, the database would speak the language of anatomical parts of the heart, their features and procedures performed. This would be much more powerful than current list.

The same with procedures. Instead of having a list of procedures, a doctor would have a set of heart parts with possibility to describe what was done against each of them.

Data integrity

Current nomenclature allows entering inconsistent data. For example, there are procedures that can not go together, for example variants of one procedure. If one of the variants was performed, it can’t be done again in another variant, because it was done already.

It it really a problem? In my opinion, yes. Current data set does contain inconsistencies, so it’s really happening. You don’t see it in the general, aggregated reports on-line, it’s visible only in the source data.

Trying to fix this problem by applying validation rules like “Procedure X can’t go together with procedure Y” would be both time consuming (to design and implement) and ineffective. There would be too many rules to think of them all.

A heart model would naturally ensure the data integrity. On other words, all the “validation rules” would be embedded in the data model, without need to specify them explicitly.


Database would use names of anatomical parts of heart. Old diagnoses names would be still useful and would refer to specific predefined heart descriptions. Data analysis would improve significantly, because it would be aware of the relations of the heart parts. Now, there are several VSD procedures. Current list considers them be totally unrelated. Anatomical ontology would know that all of them are variations of one procedure.

  • It’s a generalization of the current Nomenclature and it would be backwards compatible.
  • Database reports and data analysis would talk the language of anatomy of heart.
  • Related procedures would be represented as related (Now, the related procedures are represented as unrelated).
  • Integrity of procedures and diagnoses would be ensured.
  • Detailed, accurate heart description would result in detailed, accurate analysis and better understanding of the surgical results.

Aristotle Score criticism

A group of American surgeons involved in the congenital heart surgery databases is working on a scoring system called Aristotle Score.

Aristotle Score

In order to evaluate the surgical results, the 30-day mortality was calculated and used to compare hospitals. But the mortality alone wasn’t a good parameter, because patients are very different from each other. It’s no wonder that a hospital has high mortality, when it treats very sick patients.

To help in evaluating the results, the term complexity was introduced. In order to express the complexity, the Aristotle Score was invented. It is a scoring system, where a numerical value is assigned to each surgical procedure. The more complex procedure, the higher score. At the first glance, it looks simple.

My doubts

Patient to procedure link

Each patient carries a binary response variable with values of dead or alive. How to calculate the mortality vs Basic Score? You can calculate the mortality of a set of patients, not of a set of procedures, because it’s patients who die, not the procedures. How to relate patients to procedures?

One patient can have multiple procedures (see the data structure). Which procedure should have the outcome (dead/alive) assigned to it? When a patient has one procedure, it’s obvious. But what about the second half of patients who had more than one procedure?

The problem was “solved” by discarding procedures and leaving only one procedure per patient. It is equivalent to saying: “Only one procedure influences the outcome. Always.”

Mortality vs Basic Score

When you want to know the ABS for a patient who has two procedures, how do you calculate it? Add both score values? No. You have to discard all the procedures but one.

About half of the patients in the database have more than one procedure. The calculations of ABS are inevitably biased.


You can’t add ABS values of procedures. But existing reports employ averaging of the ABS values (or in other words, calculate the mean ABS)

Above image illustrates the averaging. Please notice the plus (+) signs on the right side. They denote addition. If you’re not allowed to add Aristotle Score values, how come you’re allowed to average them?

The unit?

The unit is not defined. How to interpret the arbitrary numbers that just came out from doctors’ heads? Why are values from 1.5 to 15? Is 6 twice as complex as 3?


When two factors are combined, a new effect can be created. Interactions are very important for every medical study.

The idea of expressing the complexity of a patient treatment with a single number means that the Aristotle Score complexity-adjusted analysis method misses any possible interactions between procedures.

How is ABS related to the mortality?

A further effort is currently made to check if the mortality follows the Aristotle Basic Score. Being curious myself, I made a binomial regression with a logit link function. As a result, I got a formula that describes the statistical relation of the ABS to the mortality and can be used to predict mortality from the ABS values. Actually, the regression itself is a process of finding such weights for the parameters, that the prediction precision is maximized.

Having the formula with weights ready, I predicted the mortality for each procedure and compared them to the actual mortality. The actual mortality for each procedure can differ even as much as five times from what’s predicted from the ABC model.

Why not just take the mortality per procedure?

Why invent some arbitrary values and then verify (whatever it means) that they reflect the mortality? Why not just calculate the mortality per procedure? Isn’t it simpler? Advantages:

  • Estimated values are probabilities, real numbers between 0 and 1.
  • You don’t have to check if they follow the mortality, because they are the mortality.
  • You are able to calculate the expected mortality for each surgeon or each center. I already implemented it and sneaked it into the restricted part of the EACTS Congenital Database reports. The expected mortality is one of the options to choose. If you happen to be the EACTS Congenital Database’s member, try it.

Expected mortality explained

Documentation update

I updated the EACTS Congenital Database documentation by adding description of the Quality of Care and the Expected mortality.

The open secret revealed

What I didn't write in the documentation is that the expected mortality works well as a replacement for the Aristotle Score. It has significant advantages over ABS. If you're the EACTS Congenital Database member, you can see it for yourself. Go to the Quality of Care charts here and:

  1. Change “Hospital Survival Axis“ to “30-day Mortality Axis“
  2. Change “Basic Score Axis“ to “Expected Mortality”
  3. Change “Split by surgeons” to “No unit split”
  4. Change “No procedure split” to “Procedures”
  5. Click the “Generate Report” button

You should see that for every procedure, the expected mortality matches the actual mortality. This is exactly how the expected mortality works. If the bubbles aren't perfectly in line, it's only because the mortalities per procedure were estimated some time ago and the data changes since then.

Let's check the same with the Basic Score. If the Basic Score matches the actual mortalities, the bubbles should be placed along the line, just like with the expected mortality. Go to the report form again (you can use the “Back” button in your browser).

  1. Change “Hospital Survival Axis“ to “30-day Mortality Axis“
  2. Leave the “Basic Score Axis“ option
  3. Change “Split by surgeons” to “No unit split”
  4. Change “No procedure split” to “Procedures”
  5. Click the “Generate Report” button

What you actually see, is nothing like a line. It's just a cloud, with bubbles spread randomly around. There are procedures with high ABS and low mortality. The opposite cases are also present. There isn't any visible relation (not mentioning the accuracy).

Careful with statistics

There was an attempt to verify in 2005 the correlation between the ABS and the mortality. The statistical study showed that the ABS factor is significant for the mortality model. Statistically, patients with high ABS had higher mortality than patients with low ABS.

Let's take a closer look at the data. Procedures are not equally popular. Some procedures are being performed more often than others. In fact, there are just few procedures which concentrate the majority of the population. In statistical study, it was enough to have just two procedures, one with low mortality and low ABS and second with high mortality and high ABS. The rest of procedures could have random values of the ABS and the statistical calculation would still show ABS as a significant factor of the mortality model.

It shows how careful one should be with statistics. Before the statistics are applied, the bare eye should be used.


Don't use the Aristotle Basic Score for the Quality of care evaluation. Use the expected mortality, it's accurate and simple to understand.