Abuse of scoring systems

Apgar is a scoring system,

…a simple and repeatable method to quickly and summarily assess the health of newborn children immediately after childbirth. (…) The Apgar score is determined by evaluating the newborn baby on five simple criteria on a scale from zero to two and summing up the five values thus obtained. The resulting Apgar score ranges from zero to 10.

One of the criteria is the skin color, which can be blue all over, blue at extremities and normal. This is an ordinal variable, which means that the variable does not have number values, but named and ordered levels. Blue at extremities is worse than normal, blue all over is worse than blue at extremities. By transition, blue all over is worse than normal.

Apgar score is meant to provide a single number as an outcome. To achieve that, five ordinal criteria need to be aggregated. Unfortunately, there is no way to directly aggregate skin color with pulse, for example. However, numbers are easy to aggregate, by means of addition. Hence the idea of transforming levels to numbers and aggregating them.

This is somewhat dangerous approach. The main purpose of Apgar is:

…to determine quickly whether a newborn needs immediate medical care.

However, having the Apgar outcome in form of numbers, people might be quick to calculate mean value and standard deviation. Looking for “mean apgar” in scholar.google.com reveals some 400 documents. It’s not a majority, because ther are 71 thousands documents with word “apgar”, so those 400 are only 0.5%.

Calculating mean and standard deviance of Apgar values wasn’t something that Apgar creator had in mind. Its purpose was to quickly assess if a newborn needs medical care.


Apgar score values are not numbers. They are summed identifiers of five ordinal variables. In order to calculate statistics, the original data (criteria values) should be used, as there are dedicated statistical methods to analyze ordinal variables. These methods, as the reader may already have guessed, are not transforming ordinal variable values into numbers in order to perform calculations on them.

When designing a survey for statistical analysis, Apgar score must not be used. The five original criteria must be included in the survey instead.

All the things that apply to Apgar, apply also to the Aristotle Score, which I have already criticized. Height and weight are numbers. Generally, things that are measured, are numbers. Things that are assessed subjectively, like newborn skin color, are ordinal and do not have values. Aristotle Score values are seemingly numbers. However, it’s important to bear in mind that they are not! Therefore, one must not calculate mean or standard deviation of Aristotle Score.

Current Basic Score reports are based on mean Basic Score values, which is an abuse of a scoring system. I suggest finding another method of quality of care evaluation.

minus 4 days to go

I have proof-read a hardcopy of my thesis, made final corrections and commited them to the repository.

Transmitting file data ……..
Committed revision 384.

I noticed that the error rate varied across chapters. I think the earliest parts were the worst, there was no page left without a change. The newest parts, however, were mostly OK, with just few slight modifications.

Some of the corrections were because of the integrity and continuity. I expected to write or do some things that I haven’t finally written or done. For example, I planned to include an appendix, which occurred to be too big and was finally removed. I spotted and removed two references to the non-existing appendix today. At first, I considered this appendix an integral part of the thesis. However, it could distract readers from the main concept, i.e. the normalization. I wouldn’t like to discuss the details of the way I have normalized the International Nomenclature for CHD. It is a task for a medicine expert, I just had to do it in order to be able to move forward with my analysis.

2×2×2 days to go

I removed a huge appendix from my thesis to make it thinner, but the thesis is still growing. The current chapter, the analysis, contains dumps of the models, where one model can take up a whole page.

It’s 23 days left to the submission. I’ve promised my supervisor a final-candidate on Sunday, so I’ve got today and tomorrow to do it. It’s going to be a busy weekend.

I am somewhat disappointed with the predictive weakness of the models. There are lots of false negatives, even though the classification threshold is low (5%). Fortunately, the classification is not a key point in my thesis. The models can still be used for fair comparing the hospitals.

Last but one step of the analysis

So far, everything I was doing was a preparation. Now, there are 10102 days to go and I’m starting the actual data analysis, i.e. the final multiple logistic regression. Two mighty servers are currently processing my data. They have already calculated the simple additive models without interactions. I hope they will finish the models with interactions by tomorrow.

After many conversations with my expert consultant,  I have achieved results that do make sense to him. No revelations, but I don’t expect them anymore. It’s good enough when the regression results match his expectations.  The calculatated coefficients are informative, as they represent the size of the effect.

Once I have the regressions ready, I’ll be ready to perform the hospital comparisons, the very final phase of the analysis.

My method of hospital comparing greenlighted

My supervisor has greenlighted my method of comparing the hospitals. The basic idea behind my method is that two hospitals are considered different, when the difference in mortality can not be sufficiently explained by the risk factors. This method uses a statistical tool ― the binomial regression. All the calculations are well-defined. I’m also working on an intuitive presentation of the results, which will use position, color and transparency.

This will be the finale of my thesis. 11102 days to go.

A new quality or mere co-occurence?

Normalizing the list of diagnoses, I’ve removed the “TGA+VSD” entry, changing the representation into two separate diagnoses, “TGA” and “VSD”. If you don’t know anything about the medicine, it seems perfectly fine. But if you do, you’ll know that the “TGA+VSD” is not a mere co-occurence, it’s a very different case from both TGA and VSD, although both TGA and VSD defects are present.

Continue reading “A new quality or mere co-occurence?”

I need a good question

I need a good question about the congenital heart surgery that can be answered by looking at the EACTS Congenital Database. I’ve so far tried this one:

When a patient has both Coarctation of Aorta and VSD, what kind of treatment is better: one-stage or two-stage?

Unfortunately, there is not enough data in the database to answer this question. There is a difference in mortality, in favor for the two-stage treatment, but the sample size is about 2.5 times too small to statistically prove the difference.

Now, I want to perform an analysis using my new shiny normalized factors. I’ve asked two surgeons already, but no ideas yet. The problem is, that they do have a lots of questions, but the information required to answer them is simply not present in the database.

If no good question is provided, I’ll just do an exploratory research, and hopefully I’ll find something. A good question would be much better, though.

Mapping finished

I’ve finished the normalization mapping of the International Nomenclature for Congenital Heart Diseases. The last problem to solve was the uniqueness of the resulting vector, a.k.a. backward compatibility. Every entry from the normalized factors can be now mapped backwards to the old nomenclature. However, inconsistent sets of nomenclature entries will be mapped back only to consistent ones, so it’s kind of a backward-incompatibility.

Why did they want to tangle diagnoses

Why does the “nomenclature” tangle the diagnoses? Why add a new diagnosis “TGA+VSD” while there are “TGA” and “VSD” already?

The new nomencalture introduces even more tangled entries. Where is it going?

I think it’s because they can’t think of a way to analyze the data with overlapping sets. It’s a run-away strategy. Unfortunately they won’t run away. The overlapping still occurs, because it’s impossible to create a list with al the possible combinations of diseases.