Journal Article
Research Support, U.S. Gov't, P.H.S.
Add like
Add dislike
Add to saved papers

A semantic lexicon for medical language processing.

OBJECTIVE: Construction of a resource that provides semantic information about words and phrases to facilitate the computer processing of medical narrative.

DESIGN: Lexemes (words and word phrases) in the Specialist Lexicon were matched against strings in the 1997 Metathesaurus of the Unified Medical Language System (UMLS) developed by the National Library of Medicine. This yielded a "semantic lexicon," in which each lexeme is associated with one or more syntactic types, each of which can have one or more semantic types. The semantic lexicon was then used to assign semantic types to lexemes occurring in a corpus of discharge summaries (603,306 sentences). Lexical items with multiple semantic types were examined to determine whether some of the types could be eliminated, on the basis of usage in discharge summaries. A concordance program was used to find contrasting contexts for each lexeme that would reflect different semantic senses. Based on this evidence, semantic preference rules were developed to reduce the number of lexemes with multiple semantic types.

RESULTS: Matching the Specialist Lexicon against the Metathesaurus produced a semantic lexicon with 75,711 lexical forms, 22,805 (30.1 percent) of which had two or more semantic types. Matching the Specialist Lexicon against one year's worth of discharge summaries identified 27,633 distinct lexical forms, 13,322 of which had at least one semantic type. This suggests that the Specialist Lexicon has about 79 percent coverage for syntactic information and 38 percent coverage for semantic information for discharge summaries. Of those lexemes in the corpus that had semantic types, 3,474 (12.6 percent) had two or more types. When semantic preference rules were applied to the semantic lexicon, the number of entries with multiple semantic types was reduced to 423 (1.5 percent). In the discharge summaries, occurrences of lexemes with multiple semantic types were reduced from 9.41 to 1.46 percent.

CONCLUSION: Automatic methods can be used to construct a semantic lexicon from existing UMLS sources. This semantic information can aid natural language processing programs that analyze medical narrative, provided that lexemes with multiple semantic types are kept to a minimum. Semantic preference rules can be used to select semantic types that are appropriate to clinical reports. Further work is needed to increase the coverage of the semantic lexicon and to exploit contextual information when selecting semantic senses.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app