1 - Introduction To Health Care Data Analytics (Bagian 2)
1 - Introduction To Health Care Data Analytics (Bagian 2)
1 - Introduction To Health Care Data Analytics (Bagian 2)
2
Data, Information, Knowledge, Wisdom
Hierarchy
(Ackoff, 1989)
3
Types of Data in an Electronic Health
Record
4
Understanding the Data: Scales of Measure
• Data come in many forms, and those forms determine what can or
cannot be done with the data
• For example, two patient names cannot be added together
• Likewise, interpreting the relative distance between two measurements
can only be done with certain kinds of data and not others
• There are four scales: Nominal, ordinal, interval, and ratio
5
Scales of Measure: Nominal
• From Latin
• Names, labels, categories
• Examples:
o Patient names (John Doe, Maria Garcia)
o Drug names (Ampicillin, Valium)
o Eye color (blue, brown, green, gray)
o Gender: male, female, unknown
o Religious preference (Catholic, Jewish, none)
• May be mapped to a number in a database
o Example: brown eyes=1, blue eyes=2
6
Scales of Measure: Ordinal
• Includes all properties of nominal (so ordinal data all have a name of
some sort)
• Example: first, second, third
• But intervals are not necessarily equal
7
Scales of Measure: Interval and Ratio
• Has equal intervals; ratio also has absolute zero
• Examples: distance, length, temperature, weight
• Includes properties of nominal and ordinal
• May be grouped together in one category called “scale”
8
Data Inconsistencies
• Inconsistent naming conventions, such as “systolic blood
pressure” versus “blood pressure, systolic”
• Inconsistent definitions, such as how the date of admission is
defined across departments
• Varying field lengths for the same data element, such as one
system allowing a patient’s last name to be up to 50 characters,
while another system allows 25 characters
• Varied data elements, such as M, F, or U for patient gender in
one system, while another system uses 1, 2, or 9, or Male,
Female, or Unknown
(American Health Information Management Association, 2012)
9
Data Dictionaries
• The first step: Obtain the data
dictionary to understand your
data
(Smith, 2016)
10
Data Dictionaries
11
Data Dictionaries
(Smith, 2016) 12
Common Terms Used in Statistical Analysis
• Population
• Sample
• Paired samples
• Data set
• Descriptive statistics
• Frequency table
• Histogram
• Chi square
• T-Test
• Correlation vs. causation
13
Term: Population
• A group of things that have something in common
• Examples:
o Patients in a particular hospital
o Patients with a certain diagnosis
o Patients with a particular attribute (gender, smoking status, age
group)
o Patients who had a certain surgical procedure in a given year by a
specific surgeon
14
Term: Sample
A representative portion or subset of
a group of things—part of a
population
• Example population: Babies born
in the United States in 2015
• Example sample: A selection of
those babies
• Paired samples: Before-and-after
studies, or matched on one or
more characteristics (Kernler, 2014, CC BY-NC-SA 4.0)
15
Confidence Intervals
• How well does a sample approximate the entire population?
• Often set at 95 percent
• The resulting intervals would bracket the true population parameter in
approximately 95 percent of the cases
16
Data Set
A data set is a collection of data for a
specific purpose. For this
presentation, for example, the data set
is a collection of 500 records that
consists of age, gender, state of
residence, marital status, blood type,
weight, eye color, and smoking status.
(Smith, 2016)
17
Descriptive Statistics
• Basic overview of the data
• Excel: Data Data Analysis
Descriptive Statistics
• Should be among the first analyses done
on a set of data
• Can identify some errors
• Mean (average), number of records
(count), range of values, maximum and
minimum values
(Smith, 2016)
18
Correlation and Causation
• Correlation: Relationship between two things
• Causation: One causes another
19
The Potential of Big Data in Healthcare
• Expand capacity to generate new knowledge
o The effectiveness of treatments (Schneeweiss, 2014)
o The prediction of outcomes (Schneeweiss, 2014)
• Knowledge dissemination
• Using analytics to combine electronic health records and genomic data
to translate personalized medicine to clinical practice
• Deliver information directly to patients and increase patient
participation in their healthcare
20
What Are Big Data?
• Characteristics of big data:
o Volume (i.e., the size of the dataset)
o Variety (i.e., data from multiple repositories, domains, or
types)
o Velocity (i.e., rate of flow)
o Variability (i.e., the change in other characteristics)
• Traditional data architectures (such as typical relational
databases) cannot handle this type of data
• New architectures are required
(National Institute of Standards and Technology, 2015)
21
Tools
• Hadoop
o Runs on clusters of hardware
• MongoDB
o Stores data using documents with fields
• NoSQL utilities
(Sas.com, 2016)
22
Requirements for Analytics for Learning Systems
• A way to ensure that patient groups being compared are truly similar
• Automated tools for analysis
• Ability to rapidly run automated tools against new data
• Software that can be used with little training and helps prevent errors in
interpretation
• Easily understood results
23
Challenges Facing Biomedical Big Data
• Amount of information
• Lack of organization
• Lack of access to data and tools
• Insufficient training in data science methods
(National Institutes of Health, 2015)
24
Introduction to Healthcare Data Analytics
Summary—Lecture B
• Data come in many forms, and those forms determine what can
or cannot be done with the data.
• Big Data have the potential to advance healthcare.
• Analysis of Big Data requires tools like Hadoop and MongoDB.
• However, biomedical Big Data face many challenges.
25
Introduction to Healthcare Data Analytics
Summary—Lecture B
26
Introduction to Healthcare Data Analytics
References—Lecture B
References
American Health Information Management Association. (2012). Managing a data dictionary. Journal of AHIMA, 83(1), 48-52.
Retrieved from http://library.ahima.org/PB/DataDictionary#.WI9uCVMrJhE
Bertolucci, J. (2013). Big data analytics: Descriptive vs. predictive vs. prescriptive. InformationWeek. Retrieved from
http://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-i
d/1113279
Dictionary.com. (n.d.). Nominal scale. Retrieved from http://www.dictionary.com/browse/nominal-scale
Escobar, G. J., Puopolo, K. M., Wi, S., Turk, B. J., Kuzniewicz, M. W., Walsh, E. M., ... & Draper, D. (2014). Stratification of
risk of early-onset sepsis in newborns≥ 34 weeks’ gestation. Pediatrics, 133(1), 30-36. Retrieved from
http://pediatrics.aappublications.org/content/pediatrics/133/1/30.full.pdf
Gartner. (2011, October 17). Gartner says worldwide enterprise IT spending to reach $2.7 trillion in 2012. Retrieved from
http://www.gartner.com/newsroom/id/1824919
Gartner IT Glossary. (2015). Descriptive analytics. Retrieved from http://www.gartner.com/it-glossary/descriptive-analytics
Gartner IT Glossary. (2015). Diagnostic analytics. Retrieved from http://www.gartner.com/it-glossary/diagnostic-analytics
27
Introduction to Healthcare Data Analytics
References—Lecture B
References
IBM (2013). Descriptive, predictive, prescriptive: Transforming asset and facilities management with analytics. Retrieved from
http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=TIW14162USEN.
Institute of Medicine of the National Academies. (2012). Best care at lower cost: The Path to continuously learning health care in
America. Washington, DC: Institute of Medicine of the National Academies. Retrieved from
http://www.nationalacademies.org/hmd/Reports/2012/Best-Care-at-Lower-Cost-The-Path-to-Continuously-Learning-Health-Care-
in-America.aspx
Institute of Medicine of the National Academies. (n.d.). The learning health care system in America. Retrieved from
http://www.nationalacademies.org/hmd/Activities/Quality/LearningHealthCare.aspx
Khanduja, J. (2015). Six steps of an analytics project - Quality assurance and project management. (2015). Quality Assurance and
Project Management. Retrieved from
http://itknowledgeexchange.techtarget.com/quality-assurance/six-steps-of-an-analytics-project/
Mayo Clinic. (2016). Overview - Sepsis. Retrieved from
http://www.mayoclinic.org/diseases-conditions/sepsis/home/ovc-20169784
Murdoch, T., & Detsky, A. (2013). The inevitable application of big data to health care. JAMA, 309(13), 1351. Retrieved from
http://dx.doi.org/10.1001/jama.2013.393
National Institute of Standards and Technology (NIST). (2015). NIST big data interoperability framework: Volume 1, definitions.
Gaithersburg, MD: NIST. Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-1.pdf
28
Introduction to Healthcare Data Analytics
References—Lecture B
National Institutes of Health. (2015). What is big data? Retrieved from http://datascience.nih.gov/bd2k/about/what
NIST/SEMATECH e-Handbook of statistical methods. (n.d.). Retrieved May 02, 2016, from
http://www.itl.nist.gov/div898/handbook/
Sas.com. (2016). What is Hadoop? Retrieved from http://www.sas.com/en_my/insights/big-data/hadoop.html
Schneeweiss, S. (2014). Learning from big health care data. New England Journal of Medicine, 370(23), 2161-2163. Retrieved
from http://www.nejm.org/doi/full/10.1056/NEJMp1401111#t=article
Shapira, G. (2016). The seven key steps of data analysis. Oracle.com. Retrieved from
http://www.oracle.com/us/corporate/profit/big-ideas/052313-gshapira-1951392.html
Figures
Ackoff, R. (1989). From data to wisdom. Presidential address to ISGSR, June 1988. Journal of Applied Systems Analysis,
16(1): 3-9.
Muirhead, C., & Dimitrakakis, J. (2014). Clinical & business intelligence: An analytics executive review needs assessment.
Health Information Management Systems Society. Retrieved from
http://www.himss.org/ResourceLibrary/genResourceDetailPDF.aspx?ItemNumber=34692
29
Introduction to Healthcare Data Analytics
References—Lecture B
Figures
Smith, K. (2016). Data dictionaries. Used with permission from Kimberly Smith.
Smith, K. (2016). Data set. Used with permission from Kimberly Smith.
Smith, K. (2016). Descriptive statistics. Used with permission from Kimberly Smith.
Smith, K. (2016). Synthetic data set. Used with permission from Kimberly Smith.
Images
Look Into My Eyes. (2009). Girl`s blue eye [online Image]. Retrieved April 28, 2016, from
https://commons.wikimedia.org/wiki/File:Deep_Blue_eye.jpg
Kehrer, P. (2009). Win, place, show [online Image]. Retrieved from https://www.flickr.com/photos/paulkehrer/3659279740
Centers for Disease Control and Prevention. (2010). Growth charts [online image]. Retrieved May 2, 2016, from
http://www.cdc.gov/growthcharts/
Lite. (2007). Soft ruler [online Image]. Retrieved from https://commons.wikimedia.org/wiki/File:Soft_ruler.jpg
Menchi. (2005). Clinical thermometer 38.7 [online Image]. Retrieved from
https://commons.wikimedia.org/wiki/File:Clinical_thermometer_38.7.JPG#/media/File:Clinical_thermometer_38.7.JPG
Kernler, D. (2014). A visual representation of selecting a simple random sample [online Image]. Retrieved from
https://commons.wikimedia.org/wiki/File:Simple_random_sampling.PNG
30
This material was developed by The University of Texas Health Science Center at Houston, funded by the Department of Health and
Human Services, Office of the National Coordinator for Health Information Technology under Award Number 90WT0006.
This presentation was produced with the support of the United States Agency for
International Development (USAID) under the terms of MEASURE Evaluation
cooperative agreement AID-OAA-L-14-00004. MEASURE Evaluation is
implemented by the Carolina Population Center, University of North Carolina at
Chapel Hill in partnership with ICF International; John Snow, Inc.; Management
Sciences for Health; Palladium; and Tulane University. Views expressed are not
necessarily those of USAID or the United States government.
www.measureevaluation.org