This video-clip is about the Matrix movie, but it can be as well about the data analysis and statistical modeling.
It’s 24 days to go. I’ve still got plenty of things to do. Luckily, I’ve also got a schedule agreed with my supervisor, tight, but possible.
I wanted my thesis to contain a nice example analysis. I’ve tried to extract a question from the surgeons I know, but I haven’t succeeded, so my analysis will be a “fishing expedition”. After all my rants about the data structure and the proposed solution, it would be nice if my thesis actually used the normalized data to answer some question. It’s like in La Fontaine’s “The Mountain’s Delivery”:
A mountain having labor
With clamor rent the air
The neighbors who came running
Predicted she would bear
A city broad as Paris
Or at least a manor house,
But at the crucial moment
The mountain dropped a mouse.
How like so many authors
Who say they’ll set to paper
A vast Promethean epic
But all that comes is vapor.
I don’t want that. I need to think hard on a good example. Or concentrate on the hospitals comparing.
Normalizing the list of diagnoses, I’ve removed the “TGA+VSD” entry, changing the representation into two separate diagnoses, “TGA” and “VSD”. If you don’t know anything about the medicine, it seems perfectly fine. But if you do, you’ll know that the “TGA+VSD” is not a mere co-occurence, it’s a very different case from both TGA and VSD, although both TGA and VSD defects are present.
I’ve finished the normalization mapping of the International Nomenclature for Congenital Heart Diseases. The last problem to solve was the uniqueness of the resulting vector, a.k.a. backward compatibility. Every entry from the normalized factors can be now mapped backwards to the old nomenclature. However, inconsistent sets of nomenclature entries will be mapped back only to consistent ones, so it’s kind of a backward-incompatibility.
Why does the “nomenclature” tangle the diagnoses? Why add a new diagnosis “TGA+VSD” while there are “TGA” and “VSD” already?
The new nomencalture introduces even more tangled entries. Where is it going?
I think it’s because they can’t think of a way to analyze the data with overlapping sets. It’s a run-away strategy. Unfortunately they won’t run away. The overlapping still occurs, because it’s impossible to create a list with al the possible combinations of diseases.
There are no really precise and good parameters that could be used to evaluate the quality of care in hospitals, which submit the data to the EACTS Congenital Database or similar databases. The databases don’t contain detailed information about patients’ health. Let’s quickly review possible QoC parameters.
I met my supervisor today. Finally, a man who took the chickling under his shelter.
It’s seven weeks to go and I’m just starting. He made a good point that I should include my previous work as a part of the thesis, and to rephrase the title, so it would include building and using the information system. This way all this work will be a part of the thesis instead of being just “a preparation”.