Official organisations that maintain databases containing personal
information need to devise better ways to protect individuals' privacy while
preserving the value of the information to researchers, academics argue.
A report by
Carnegie
Mellon University statistics professor George Duncan in the journal
Science
claimed that traditional methods of 'de-identifying' records, such as stripping
away Social Security numbers or birthdates, are inadequate to safeguard privacy.
Advertisement
Professor Duncan warned that a person who knows enough about the data pool
could use other characteristics to identify individuals.
The academic pointed out that he is the only person who holds a Ph.D. in
statistics and teaches in Carnegie Mellon's H. John Heinz III School of Public
Policy and Management, so any data set that included that information, even with
Duncan's name removed, could be used to determine his identity.
This could have serious consequences when it comes to data that includes
information about a person's medical history or sexual behaviour, such as that
collected by the
National
Center for Health Statistics.
Unfortunately, the characteristics that can be used to 're-identify' records
are often the very information that makes the data useful to legitimate
researchers.
"The question is how data can be made useful for research purposes without
compromising the confidentiality of those who provided the data," said Professor
Duncan.
Possible solutions to this dilemma include administrative procedures that
limit data access to approved users who must abide by restrictions on the use of
information, and statistical methods that 'de-identify' records in such a way
that the user cannot readily reconstruct personal identities.
In order to be effective, these statistical transformations must be tailored
to how the data will be used so that researchers can see the information that
interests them while other characteristics remain veiled.
"Achieving 'adequate' privacy will require engineering innovation, managerial
commitment, information cooperation of data subjects and social controls
(legislation, regulation, codes of conduct by professional associations and
response to reactions of the public)," Professor Duncan concluded.
Comments
Have your say on this article