If I'm going to write my thesis until June, I need a subject, and I need it soon. One of the best picks would be something related to the EACTS Congenital Database, since I'm involved in it for few years now and I know it pretty well.
To know that it's about this database, is not enough. To write a good thesis, I'll need a good subject. For some time, I thought that it could be something fancy, a neat analysis with interesting results. Unfortunately, it's not possible, because the database lacks explanatory variables. The dependent variable could be the mortality, or in scope of one patient, prediction of the treatment success. In this database, the patient is described only by gender, age, weight and a set of diagnoses. With overall mortality of 5%, it's not possible to find a combination of explanatory variables that would give more than, say, 50% of death probability. So the prediction, the most interesting and valuable analysis, is condemned to failure.
My last idea is to get down to the ground and work on what this database really needs: decent reports. The main problems would be:
- Compare surgeons. It's not a trivial thing, as each surgeon performs different operations. Some surgeons perform only trivial procedures, while others deal with really complex cases. The task would be to create a way to compare pairs of surgeons. I would expect some pairs of surgeons should be incomparable. So far, comparison was attempted with things like the Aristotle Score, which relies on the infamous procedure-mortality report. The score values are arbitrary and units are not defined.
- Compare hospitals. I'm sure hospitals would want that. Personally, I think that it's more about surgeons (or in other words: about people) than about hospitals, but there'll be the demand. I'll be just like surgeons comparison, just different grouping of observations.
- Find similar surgeons. I'm sure that it's possible and interesting to find surgeons that perform similar operations. Maybe they'll have issues to discuss?
- Find groups of patients with increased risk of death. Since the explanatory variables are pretty simple, probably it's possible to find some combinations of gender, age, weight and diagnoses that would give much worse results than the average.
- An infamous procedure-mortality report. Right now, to obtain such report, for each patient only one procedure is chosen (in few ways, all of them unreliable), so the database is flattened to one-patient-one-procedure form. It's cruel, it's ugly and it's stupid and I hate it. New way of relating procedures with mortality must be found.
- Spam detection. I believe that some hospitals or some surgeons enter data selectively. It could be detectable, even though there is no training set.
- Data validation. Data analysis would lead to set of rules, that relate diagnoses and procedures. So if some unusual combinations occur, system would point them out and ask for checking.
- Patient clustering. I believe that patients in the database don't form a single population. They are more like a set of populations. It should be possible to find clusters of patients with similar qualities.
Those problems split into smaller ones. What, besides, mortality, could be considered to denote a quality of care? This fundamental issue, still unresolved, makes many reports questionable.
All of those problems combined, should be enough for a MA thesis. But I still have doubts. I've fought so many times, and had to give up very often. I wanted to develop valuable reports, and I was forced to make stupid ones.
"You can't do that. Big names in America created Basic Score and you can not question it." — they are big names in surgery, and not in data analysis nor statistics. Whatever, they are just big names. Now they could pick up any subject…
"This is too complicated. Nobody will understand it." — surgeons are educated, aren't they?
"Generally I like the idea but we don't have time for this. Just rewrite the old report as it was." — when my boss says "generally", it means "no". So he meant, he didn't like the idea but he lacked arguments to support his opinion.
Let's say, I write this thesis and implement all the reports. I'm a kind of a person who doesn't like his work rejected. When I implement them, I'll want them on-line. And I'm pretty sure this wouldn't happen.