Difference between revisions of "Theme:88b73549-fec0-4fb9-99f6-fe1055d6b76a"

From The Embassy of Good Science
(Created page with "{{Theme |Theme Type=Misconduct & Misbehaviors |Title=Improper data use (a bias distorting research results) |Is About=Researchers may handle data in a number of ways that can...")
 
 
(8 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
{{Theme
 
{{Theme
 
|Theme Type=Misconduct & Misbehaviors
 
|Theme Type=Misconduct & Misbehaviors
 +
|Has Parent Theme=Theme:48185295-9e1e-41fb-ab70-948596e588d5
 
|Title=Improper data use (a bias distorting research results)
 
|Title=Improper data use (a bias distorting research results)
 
|Is About=Researchers may handle data in a number of ways that can influence the results to become misleading.
 
|Is About=Researchers may handle data in a number of ways that can influence the results to become misleading.
Line 7: Line 8:
 
Examples of improper data use include:
 
Examples of improper data use include:
  
'''Massaging''': … extensive transformations or other maneuvers to make inconclusive data appear … conclusive
+
*'''Massaging''': … extensive transformations or other maneuvers to make inconclusive data appear … conclusive
 +
*'''Extrapolating''': … predicting future trends based on unsupported assumptions …
 +
*'''Smoothing''': discarding data points too far removed from expected … values
 +
*'''Slanting''': … selecting certain trends in the data, … discarding others which do not fit …
 +
*'''Fudging''': creating data points to augment incomplete data sets …
 +
*'''Manufacturing''': creating entire data sets de novo, … <ref>Sindermann C. J. “Winning the games scientists play” (Plenum Press, NY, 1982)</ref>
  
'''Extrapolating''': … predicting future trends based on unsupported assumptions …
+
Data dredging is looking for too many possible associations in a dataset to see of any of them are statistically significant. Data dredging results in false positive results.
  
'''Smoothing''': discarding data points too far removed from expected … values
+
“When a large number of associations can be looked at in a dataset where only a few real asso­ciations exist, a P value of 0.05 is compatible with the large majority of findings still being false positives.” <ref>Smith, George Davey, and Shah Ebrahim. "Data dredging, bias, or confounding: They can all get you into the BMJ and the Friday papers."</ref>
 
 
'''Slanting''': … selecting certain trends in the data, … discarding others which do not fit …
 
 
 
'''Fudging''': creating data points to augment incomplete data sets …
 
  
'''Manufacturing''': creating entire data sets de novo, … <ref>Sindermann C. J. “Winning the games scientists play” (Plenum Press, NY, 1982)</ref>
+
'''Origin of words'''
  
Data dredging is looking for too many possible associations in a dataset to see of any of them are statistically significant. Data dredging results in false positive results.
+
There are several terms describing the act of data dredging. These include:
  
“When a large number of associations can be looked at in a dataset where only a few real asso­ciations exist, a P value of 0.05 is compatible with the large majority of findings still being false positives.” <ref>Smith, George Davey, and Shah Ebrahim. "Data dredging, bias, or confounding: They can all get you into the BMJ and the Friday papers."</ref>
+
<br />
  
Origin: "Data Dredging" (Selvin & Stuart, 1966); "Data Fishing" (Grover & Mehra, 2008), “Data Snooping,” “P-hacking”
+
*"Data Dredging"<ref>Selvin, H. C., & Stuart, A. (1966). Data-dredging procedures in survey analysis. ''The American Statistician'', ''20''(3), 20-23.</ref>
 +
*"Data Fishing"<ref>Grover, L. K., & Mehra, R. (2008). The lure of statistics in data mining. ''Journal of Statistics Education'', ''16''(1).</ref>
 +
*“Data Snooping,”
 +
*“P-hacking”  
 +
<references />
 
|Important For=Principal investigators; Researchers; Policy makers; Supervisors; Postdocs; Journal publishers; Journal editors
 
|Important For=Principal investigators; Researchers; Policy makers; Supervisors; Postdocs; Journal publishers; Journal editors
|Has Best Practice==== Related tools ===
+
|Has Best Practice====Related tools===
 
By Jensen (2000) <ref>Jensen, David. "Data Snooping, Dredging and Fishing: The Dark Side of Data Mining, A SIGKDD99 Panel Report." SIGKDD Explorations 1.2 (2000): 52-54.</ref>
 
By Jensen (2000) <ref>Jensen, David. "Data Snooping, Dredging and Fishing: The Dark Side of Data Mining, A SIGKDD99 Panel Report." SIGKDD Explorations 1.2 (2000): 52-54.</ref>
  
* New data and cross-validation
+
*New data and cross-validation
* Sidak, Bonferroni, and other adjustments
+
*Sidak, Bonferroni, and other adjustments
* Resampling and randomization techniques
+
*Resampling and randomization techniques
  
 
By Glenn & Cormier (2015) <ref>Suter, Glenn W., and Susan M. Cormier. "The problem of biased data and potential solutions for health and environmental assessments." Human and Ecological Risk Assessment: An International Journal 21.7 (2015): 1736-1752.</ref>
 
By Glenn & Cormier (2015) <ref>Suter, Glenn W., and Susan M. Cormier. "The problem of biased data and potential solutions for health and environmental assessments." Human and Ecological Risk Assessment: An International Journal 21.7 (2015): 1736-1752.</ref>
  
* Performing own reviews of the sources of data,
+
*Performing own reviews of the sources of data,
* Checking for retractions and corrections,
+
*Checking for retractions and corrections,
* Requiring full disclosure of methods,
+
*Requiring full disclosure of methods,
* Acquiring original data and reanalyzing it,
+
*Acquiring original data and reanalyzing it,
* Avoiding secondary sources,
+
*Avoiding secondary sources,
* Avoiding unreplicated studies or studies that are not concordant with related studies, and
+
*Avoiding unreplicated studies or studies that are not concordant with related studies, and
* Checking for funding or investigator biases.
+
*Checking for funding or investigator biases.
  
=== Related cases ===
+
===Related cases===
 
Convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values . <ref>armona-Bayonas A, Jimenez-Fonseca P, Fernandez-Somoano A, et al. Top ten errors of statistical analysis in observational studies for cancer research. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico. 2018;20(8):954-965.</ref> <ref>Reanalysis: Ebrahim S, Sohani Z, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA 2014;312:1024-32</ref>
 
Convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values . <ref>armona-Bayonas A, Jimenez-Fonseca P, Fernandez-Somoano A, et al. Top ten errors of statistical analysis in observational studies for cancer research. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico. 2018;20(8):954-965.</ref> <ref>Reanalysis: Ebrahim S, Sohani Z, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA 2014;312:1024-32</ref>
 +
<references />
 +
}}
 +
{{Related To
 +
|Related To Resource=Resource:47bfd883-c518-4a97-98fb-86b5cf442d3e;Resource:5aefe751-0a20-4597-98a5-a59bf06a987a;Resource:8354ff67-9da4-4325-8395-d16e30059fb2;Resource:6ee4f37d-aa55-45c9-93ae-86831a37ca17;Resource:226c89f1-a061-4bb0-8ec4-79583de2ddf0;Resource:1b777e40-9d7f-4ef4-a601-6be70c9e386a;Resource:369d2eb6-90ef-4198-8268-a95e51a307d0;Resource:6bcb5216-4e02-470f-85e7-abd492d47134
 +
|Related To Theme=Theme:6b584d4e-2c9d-4e27-b370-5fbdb983ab46;Theme:26631aa0-18f0-4635-b71b-80a6f4e58d33
 +
}}
 +
{{Tags
 +
|Has Virtue And Value=Accuracy
 +
|Has Good Practice And Misconduct=Questionable research practice
 
}}
 
}}
{{Related To}}
 
{{Tags}}
 

Latest revision as of 20:44, 1 February 2021

Improper data use (a bias distorting research results)

What is this about?

Researchers may handle data in a number of ways that can influence the results to become misleading.

Why is this important?

Improper data use undermines the ethos of science and the corresponding misleading results can misguide and distort the production of knowledge.

Examples of improper data use include:

  • Massaging: … extensive transformations or other maneuvers to make inconclusive data appear … conclusive
  • Extrapolating: … predicting future trends based on unsupported assumptions …
  • Smoothing: discarding data points too far removed from expected … values
  • Slanting: … selecting certain trends in the data, … discarding others which do not fit …
  • Fudging: creating data points to augment incomplete data sets …
  • Manufacturing: creating entire data sets de novo, … [1]

Data dredging is looking for too many possible associations in a dataset to see of any of them are statistically significant. Data dredging results in false positive results.

“When a large number of associations can be looked at in a dataset where only a few real asso­ciations exist, a P value of 0.05 is compatible with the large majority of findings still being false positives.” [2]

Origin of words

There are several terms describing the act of data dredging. These include:


  • "Data Dredging"[3]
  • "Data Fishing"[4]
  • “Data Snooping,”
  • “P-hacking”
  1. Sindermann C. J. “Winning the games scientists play” (Plenum Press, NY, 1982)
  2. Smith, George Davey, and Shah Ebrahim. "Data dredging, bias, or confounding: They can all get you into the BMJ and the Friday papers."
  3. Selvin, H. C., & Stuart, A. (1966). Data-dredging procedures in survey analysis. The American Statistician, 20(3), 20-23.
  4. Grover, L. K., & Mehra, R. (2008). The lure of statistics in data mining. Journal of Statistics Education, 16(1).

For whom is this important?

What are the best practices?

Related tools

By Jensen (2000) [1]

  • New data and cross-validation
  • Sidak, Bonferroni, and other adjustments
  • Resampling and randomization techniques

By Glenn & Cormier (2015) [2]

  • Performing own reviews of the sources of data,
  • Checking for retractions and corrections,
  • Requiring full disclosure of methods,
  • Acquiring original data and reanalyzing it,
  • Avoiding secondary sources,
  • Avoiding unreplicated studies or studies that are not concordant with related studies, and
  • Checking for funding or investigator biases.

Related cases

Convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values . [3] [4]

  1. Jensen, David. "Data Snooping, Dredging and Fishing: The Dark Side of Data Mining, A SIGKDD99 Panel Report." SIGKDD Explorations 1.2 (2000): 52-54.
  2. Suter, Glenn W., and Susan M. Cormier. "The problem of biased data and potential solutions for health and environmental assessments." Human and Ecological Risk Assessment: An International Journal 21.7 (2015): 1736-1752.
  3. armona-Bayonas A, Jimenez-Fonseca P, Fernandez-Somoano A, et al. Top ten errors of statistical analysis in observational studies for cancer research. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico. 2018;20(8):954-965.
  4. Reanalysis: Ebrahim S, Sohani Z, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA 2014;312:1024-32

Other information

Virtues & Values
Good Practices & Misconduct