IPNI logo - link to home The International Plant Names Index

Deduplication

No duplicate records were deleted when the three datasets were merged. Instead, we have started to link duplicate records in such a way that only one record for a particular name citation is presented to the user as the result of a search. Records considered to be duplicates (i.e. considered to refer to the same name in the same place of publication) will be hidden from the view of the user unless he/she chooses to examine them. Since the GI and APNI datasets include more detailed records and have benefited from more standardization and verification, the default option is to hide the IK record when it appears to be superseded by a record from GI or APNI. Eventually we will compare all duplicate records, and where there are discrepancies the correct record will be presented to the user but the ‘rejected’ record will still be available for consultation. Thus no records are ever deleted from the database.

Some 80% of the process of identifying and linking duplicate records has been accomplished electronically. The electronic matching routine has proved extremely accurate, but it is not foolproof. A tiny percentage of electronic matches have proved to be false matches, i.e. the record identified by the computer as a duplicate actually refers to a different place of publication. The process of checking electronic matches by eye and eliminating false matches is well underway. However, frequent users of IPNI are likely to encounter false matches occasionally and we would be most grateful to be informed of these errors. All users making a contribution in this way will be fully acknowledged in the IPNI.

About 20% of the process of identifying and linking duplicate records cannot be accomplished electronically. This is because some records which refer to the same name citation (i.e. the same name in the same place of publication) may differ due to deliberate corrections made by compilers while entering the data, typographic errors introduced by the compilers while entering the data and, in the case of IK, orthographic errors due to the optical scanning process by which the original hard copy IK was converted to electronic form. A number of family specialists have already checked the problem records for their family and contributed thousands of matches made by eye. We would like to hear from you if you are a specialist on a particular family or group and would like to help us deduplicate records.

Back to IPNI Home

 

 

© Copyright 2004 International Plant Names Index

contact us