Normalizing the data representation

I’ve spend many days working on the database browser and normalization of the data structure. The original problem was that the so called nomenclature, which is a flat list of diseases that couples factors that should be independent for the analysis.

The plan is to decouple the factors and create alternative nomenclature. The new, as I call it, normalized factors will carry the same information as the original nomenclature.

The nomenclature normalization

To decouple the factors, it’s necessary to have a specialized data browser. I’ve just written one, using the brilliant Django framework, but I didn’t write any spec on how its usage is going to look like. So let’s write it down. First, the vocabulary:

  • Factor is a disease, surgical procedure, general preoperative risk factor, non-cardiac abnormality or complication, coming from the original “Nomenclature”
  • Normalized factor is like a factor, but it denotes only one property of the patient. For example, factor TGA + VSD will be split into two normalized factors, TGA and VSD.
  1. Take a disease, for example the Tetralogy of Fallot.
  2. Find all the factors that indicate the presence of this disease.
  3. Find factors that are occurring together with the disease.
  4. Write them in a normalized way, i.e. one normalized factor describing one property of the patient.

The patient history research

I believe that it’s necessary to look at the combinations of factors to see the frequent patterns. I’ve written a tool for the patient history research. The usage spec is:

  1. Take a disease
  2. Find the normalized factors that seem to be important for this disease. Probably the ones that occur frequently together with this disease. Mark them as interesting for this disease. They will be used to build the patients’ histories.
  3. Look at the histories and counts. Decide, whether to add or remove the interesting factors.

The analysis of the patient histories can lead to identification of the common patterns of disease/treatment combinations. Those patterns could be considered as groups of patients of equal treatment complexity.


Author: automatthias

You won't believe what a skeptic I am.

1 thought on “Normalizing the data representation”

Comments are closed.