EPIGUS work: Datenbank Clean-Up Statistik Austria

EPIGUS is an Institute for Holistic Accident and Safety Research.
It is lead by the university professor DI Dr. Ernst Pfleger.
For information about their research visit their site at http://www.unfallforschung.at/.

Peter Kleissner wrote a program which cleans up databases got from the Statistik Austria.

The clean up process includes following steps:

  • finding and solving invalid record sets
  • finding and solving gaps in record sets
  • exchanging sequenze and id number where necessary

The file format of databases from Statistik Austria is a special file format.
401 characters per line are one record set and every record set is encoded character-wise.
A record set describes an accident with all accident data.
The databases for one year consists about 45 000 lines (~ 15 MB).

All found errors are logged in an automatic generated log file by the program:

---- Datenbank Clean-Up StA generated log file ----- 

Datum: Donnerstag, 22. Mai 2008 20:07
Eingabe Datenbank: C:\Users\Peter Kleissner\Desktop\Test.bak
Ausgabe Datenbank: C:\Users\Peter Kleissner\Desktop\Test_k.txt

Datensätze: 15 [davon ungültig: 0]
Folgeblattnummer-Fehler: 8
Lücken in Datensätzen: 3

0 ungültige Datensätze (mit ungültiger Datensatzlänge) übersprungen:

8 Folgeblatnummern geändert:
0255086 1 0255087
1143174 1 1143173
1143175 2 1143173
1143176 3 1143173
1143177 4 1143173
1299459 1 1299458
1299452 1 1299450
1299447 2 1299450

Lücken in 3 Datensätzen korrigiert:
0130517
0131786
0131791

This program was developed in Visual Studio 2005 Professional Edition C++ (Win32 application without use of frameworks).
It also has the programmed-in command line mode.
The analyzing and converting "engine" is written in Assembler, which features the program to be very fast (2 seconds for 45000 lines).

Here are some Screenshots about the work:


^ Top
Last modified: 24 May 2008