Purdue University LogoComputer models may now be able to search through thousands of injury reports in big administrative medical datasets or insurance. These models are claimed to be under development by researchers at Purdue University’s School of Industrial Engineering and Mutual Research Institute for Safety in Hopkinton, Mass.

This will break new ground in the medical and insurance field as things may now be much more organized and convenient. It is claimed that the data may automatically be classified based on particular words or phrases.

Mark Lehto, an associate professor in Purdue University’s School of Industrial Engineering commented “”One goal is to identify the most important causes of injuries so that efforts could be directed toward reducing the burden of injuries in society.”

The reports, which are generally filled out by employers, health-care professionals or claimants themselves, are at present classified by manual coders hired by users. These include the likes of the National Center for Health Statistics, hospital staff or insurance industry handlers who evaluate thousands of ‘injury narratives’ included in reports. As per Lehto, this process is obviously very labor-intensive.

The codes were assigned to injury reports from workers’ compensation claims by apparently using two different models developed with a procedure known as ‘Bayesian methods’ by the Purdue engineer and researchers at the Liberty Mutual Research Institute for Safety in Hopkinton, Mass.

Lehto mentioned, “The predictions were quite good. The results were comparable to the human coders. The accuracy is surprising considering all of the misspellings, run-on words, abbreviations and inconsistent or missing punctuations seen in these workers’ compensation claim narratives.”

Lehto, the 2008 Liberty Mutual Research Institute for Safety visiting scholar added, “Can you imagine reading through 10,000 of these narratives and trying to interpret what the cause of injury is and assign different codes?”

Annually, tens of thousands of claims are apparently managed and maintained by Insurance companies. The research observed approaches for proficient assignment of each claim by means of a computer approach with one and two-digit ‘event code’ of categories developed by the U.S Bureau of Labor Statistics.

Lehto remarked “So now we are trying to take these vast sets of data, which have been limited in their utility due to the large expense in hiring manual coders, and we are able to glean important information from the injury narratives and come up with new knowledge on the potential causes and prevention of injuries.”

The new models may lead to programs that might automatically code reports as they are being filed.

Lehto quoted “These models can be easily updated to deal with new types of accidents they haven’t encountered before.”

The models premeditated the possibility that reports may be categorized by human coders in particular groups. One model, known as ‘naïve’ examined individual words, and the other, called ‘fuzzy’ looked at series of words and phrases in the narratives like ‘fell off a ladder’.

A database of an approximate 14,000 claim cases were used by the researchers. About 11,000 were used to develop the models. To test the models, 3,000 were used. As per Lehto, it’s important to distinguish that the predicted 3,000 cases were different than the ones used to develop the models. These were cases the models hadn’t seen before, and the models accurately predicted how these cases would be classified by human coders.

These findings were detailed in a paper published in August in the journal Injury Prevention.