Difference between revisions of "Theme:88b73549-fec0-4fb9-99f6-fe1055d6b76a"

From The Embassy of Good Science
(Created page with "{{Theme |Theme Type=Misconduct & Misbehaviors |Title=Improper data use (a bias distorting research results) |Is About=Researchers may handle data in a number of ways that can...")
 
Line 1: Line 1:
 
{{Theme
 
{{Theme
 
|Theme Type=Misconduct & Misbehaviors
 
|Theme Type=Misconduct & Misbehaviors
 +
|Has Parent Theme=Theme:3c6a13ad-6861-4a5f-bf5b-491693ee6b6d
 
|Title=Improper data use (a bias distorting research results)
 
|Title=Improper data use (a bias distorting research results)
 
|Is About=Researchers may handle data in a number of ways that can influence the results to become misleading.
 
|Is About=Researchers may handle data in a number of ways that can influence the results to become misleading.
Line 24: Line 25:
  
 
Origin: "Data Dredging" (Selvin & Stuart, 1966); "Data Fishing" (Grover & Mehra, 2008), “Data Snooping,” “P-hacking”
 
Origin: "Data Dredging" (Selvin & Stuart, 1966); "Data Fishing" (Grover & Mehra, 2008), “Data Snooping,” “P-hacking”
 +
<references />
 
|Important For=Principal investigators; Researchers; Policy makers; Supervisors; Postdocs; Journal publishers; Journal editors
 
|Important For=Principal investigators; Researchers; Policy makers; Supervisors; Postdocs; Journal publishers; Journal editors
|Has Best Practice==== Related tools ===
+
|Has Best Practice====Related tools===
 
By Jensen (2000) <ref>Jensen, David. "Data Snooping, Dredging and Fishing: The Dark Side of Data Mining, A SIGKDD99 Panel Report." SIGKDD Explorations 1.2 (2000): 52-54.</ref>
 
By Jensen (2000) <ref>Jensen, David. "Data Snooping, Dredging and Fishing: The Dark Side of Data Mining, A SIGKDD99 Panel Report." SIGKDD Explorations 1.2 (2000): 52-54.</ref>
  
* New data and cross-validation
+
*New data and cross-validation
* Sidak, Bonferroni, and other adjustments
+
*Sidak, Bonferroni, and other adjustments
* Resampling and randomization techniques
+
*Resampling and randomization techniques
  
 
By Glenn & Cormier (2015) <ref>Suter, Glenn W., and Susan M. Cormier. "The problem of biased data and potential solutions for health and environmental assessments." Human and Ecological Risk Assessment: An International Journal 21.7 (2015): 1736-1752.</ref>
 
By Glenn & Cormier (2015) <ref>Suter, Glenn W., and Susan M. Cormier. "The problem of biased data and potential solutions for health and environmental assessments." Human and Ecological Risk Assessment: An International Journal 21.7 (2015): 1736-1752.</ref>
  
* Performing own reviews of the sources of data,
+
*Performing own reviews of the sources of data,
* Checking for retractions and corrections,
+
*Checking for retractions and corrections,
* Requiring full disclosure of methods,
+
*Requiring full disclosure of methods,
* Acquiring original data and reanalyzing it,
+
*Acquiring original data and reanalyzing it,
* Avoiding secondary sources,
+
*Avoiding secondary sources,
* Avoiding unreplicated studies or studies that are not concordant with related studies, and
+
*Avoiding unreplicated studies or studies that are not concordant with related studies, and
* Checking for funding or investigator biases.
+
*Checking for funding or investigator biases.
  
=== Related cases ===
+
===Related cases===
 
Convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values . <ref>armona-Bayonas A, Jimenez-Fonseca P, Fernandez-Somoano A, et al. Top ten errors of statistical analysis in observational studies for cancer research. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico. 2018;20(8):954-965.</ref> <ref>Reanalysis: Ebrahim S, Sohani Z, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA 2014;312:1024-32</ref>
 
Convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values . <ref>armona-Bayonas A, Jimenez-Fonseca P, Fernandez-Somoano A, et al. Top ten errors of statistical analysis in observational studies for cancer research. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico. 2018;20(8):954-965.</ref> <ref>Reanalysis: Ebrahim S, Sohani Z, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA 2014;312:1024-32</ref>
 +
<references />
 
}}
 
}}
 
{{Related To}}
 
{{Related To}}
 
{{Tags}}
 
{{Tags}}

Revision as of 08:49, 20 May 2020

Improper data use (a bias distorting research results)

What is this about?

Researchers may handle data in a number of ways that can influence the results to become misleading.

Why is this important?

Improper data use undermines the ethos of science and the corresponding misleading results can misguide and distort the production of knowledge.

Examples of improper data use include:

Massaging: … extensive transformations or other maneuvers to make inconclusive data appear … conclusive

Extrapolating: … predicting future trends based on unsupported assumptions …

Smoothing: discarding data points too far removed from expected … values

Slanting: … selecting certain trends in the data, … discarding others which do not fit …

Fudging: creating data points to augment incomplete data sets …

Manufacturing: creating entire data sets de novo, … [1]

Data dredging is looking for too many possible associations in a dataset to see of any of them are statistically significant. Data dredging results in false positive results.

“When a large number of associations can be looked at in a dataset where only a few real asso­ciations exist, a P value of 0.05 is compatible with the large majority of findings still being false positives.” [2]

Origin: "Data Dredging" (Selvin & Stuart, 1966); "Data Fishing" (Grover & Mehra, 2008), “Data Snooping,” “P-hacking”

  1. Sindermann C. J. “Winning the games scientists play” (Plenum Press, NY, 1982)
  2. Smith, George Davey, and Shah Ebrahim. "Data dredging, bias, or confounding: They can all get you into the BMJ and the Friday papers."

For whom is this important?

What are the best practices?

Related tools

By Jensen (2000) [1]

  • New data and cross-validation
  • Sidak, Bonferroni, and other adjustments
  • Resampling and randomization techniques

By Glenn & Cormier (2015) [2]

  • Performing own reviews of the sources of data,
  • Checking for retractions and corrections,
  • Requiring full disclosure of methods,
  • Acquiring original data and reanalyzing it,
  • Avoiding secondary sources,
  • Avoiding unreplicated studies or studies that are not concordant with related studies, and
  • Checking for funding or investigator biases.

Related cases

Convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values . [3] [4]

  1. Jensen, David. "Data Snooping, Dredging and Fishing: The Dark Side of Data Mining, A SIGKDD99 Panel Report." SIGKDD Explorations 1.2 (2000): 52-54.
  2. Suter, Glenn W., and Susan M. Cormier. "The problem of biased data and potential solutions for health and environmental assessments." Human and Ecological Risk Assessment: An International Journal 21.7 (2015): 1736-1752.
  3. armona-Bayonas A, Jimenez-Fonseca P, Fernandez-Somoano A, et al. Top ten errors of statistical analysis in observational studies for cancer research. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico. 2018;20(8):954-965.
  4. Reanalysis: Ebrahim S, Sohani Z, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA 2014;312:1024-32

Other information

Cookies help us deliver our services. By using our services, you agree to our use of cookies.
5.1.6