Journal Article
Review
Add like
Add dislike
Add to saved papers

MCN: A Comprehensive Corpus for Medical Concept Normalization.

Normalization of clinical text involves linking different ways of talking about the same clinical concept to the same term in the standardized vocabulary. To date, very few annotated corpora for normalization have been available, and existing corpora so far have been limited in scope and only dealt with the normalization of diseases and disorders. In this paper, we describe the annotation methodology we developed in order to create a new manually annotated wide-coverage corpus for clinical concept normalization, the Medical Concept Normalization (MCN) corpus. In order to ensure wider coverage, we applied normalization to the text spans corresponding to the medical problems, treatments, and tests in the named entity corpus released for the fourth i2b2/VA shared task. In contrast to previous annotation efforts, we do not assign multiple concept labels to the named entities that do not map to a unique concept in the controlled vocabulary. Nor do we leave that named entity without a concept label. Instead, our normalization method that splits such named entities, resolving some of the core ambiguity issues. Lastly, we supply a sieve-based normalization baseline for MCN which combines MetaMap with multiple exact match components. The resulting corpus consists of 100 discharge summaries and provides normalization for the total of 10,919 concept mentions, using 3,792 unique concepts from two controlled vocabularies. Our inter-annotator agreement is 67.69% pre-adjudication and 74.20% post-adjudication. Our sieve-based normalization baseline for MCN achieves 77% accuracy in cross-validation. We also detail the challenges of creating a normalization corpus, including the limitations deriving from both the mention span selection and the ambiguity and inconsistency within the current standardized terminologies. In order to facilitate the development of improved concept normalization methods, the MCN corpus will be publicly released to the research community in a shared task in 2019.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app