Metodologi Booklet Edited May 2012
Metodologi Booklet Edited May 2012
Metodologi Booklet Edited May 2012
BUKU PANDUAN
KAJIAN SAINTIFIK, STATISTIK DAN PENGENALAN SPSS
Cetakan 7
Oleh: Dr Suhazeli bin Abdullah Pakar Perubatan Keluarga Klinik Kesihatan Marang Terengganu.
Sempena
Pengenalan SPSS Anjuran Jabatan Kesihatan Negeri Terengganu 15 dan 16 Mei 2012
ii
All meterial in this book and CD are NOT copyrighted by the author, it may reprinted without permission of the author. All the referrence meterials can easily downloaded through website suhazeli-files.blogspot.com. You can politely requests to reprinted or reproduce meterial from this book or CD by simply mail to me [email protected], or by snail mail at Klinik Kesihatan Marang, 21600, Marang, Terengganu.
Cetakan 1......................March 2005 Cetakan 2.....................March 2006 Cetakan 3......................May 2006 Cetakan 4......................March 2008 Cetakan 5......................April 2009 Cetakan 6......................April 2010 Cetakan 7......................Mac 2012 Cetakan 8......................Mei 2012
iii
ISI KANDUNGAN
Sepatah Kata dari Penulis Kata Aluan Untuk Cetakan Ke-tujuh Pengakuan Kandungan Cakera Padat (DVD) BAB II. Asas Statistik Definisi Perkataan Statistik .............................................................................. 1 Apakah Epidimiologi? ...................................................................................... 3 Beberapa Pengukuran Statistik. ...................................................................... 3 Confounding .................................................................................................... 7 Measurement error and bias......................................................................... 12 Analysing validity ........................................................................................... 14 Jenis-jenis Kajian Statistik dan Design Kajian ................................................ 16 BAB III. Membuat Kajian Saintifik Research Objective ........................................................................................ 21 Study Hypothesis ........................................................................................... 22 How to search meterials ................................................................................ 22 How to do study ............................................................................................ 28 How To Make Study Topic ............................................................................. 33 Pengumpulan Data ........................................................................................ 35 Sampling Method .......................................................................................... 38 Statistik Inferens ............................................................................................ 45 BAB IV. Pengenalan Asas SPSS. 4 Pendahuluan kepada sistem analisa berkomputer. ...................................... 48 Bagaimana bermula ....................................................................................... 48 Menyimpan data bagi tujuan analisis............................................................ 49 Mencipta variabel dalam SPSS ...................................................................... 50 Memasukkan label ke dalam variabel ........................................................... 53 Transform Data [Compute & Recode] ........................................................... 55 Menjelajah (Exploring) .................................................................................. 60 Frekuensi (Frequency) ................................................................................... 65 Penjelasan Data (Descriptives) ...................................................................... 68 Impot Fail, Copy & Paste ............................................................................... 69 Select And Deselect Case ............................................................................... 74 BAB V. Analisa Parametrik Z-Test (Ujian Z) ............................................................................................... 77 Ujian Khi Kuasa Dua [2] ................................................................................ 78 Ujian T Independent ...................................................................................... 83 Ujian T Berpasangan ...................................................................................... 87 ANOVA (Analysis of Variance) ....................................................................... 90 Korelasi (Correlation) ..................................................................................... 94 Regresi (Regression) ...................................................................................... 97 Bahan Rujukan Index (Perkataan untuk dirujuk) Lampiran: Kesesuaian Ujian Statistik Dengan Jenis Variabel Langkah untuk Menganalisa data
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS
21
48
77
100 a c e
iv
Senarai Latihan
Latihan 1: Memasukkan nama variable dan jenis dalam SPSS spreadsheet ............... 52 Latihan 2: Latihan melengkapkan variable dan label dalam SPSS spreadsheet .......... 55 Latihan 3: Sila buat pengiraan mengikut formula yang anda ketahui bagi: ................ 57 Latihan 4: Sila recode beberapa variable dibawah: ..................................................... 59 Latihan 5: Sila explore variabel-variabel seperti berikut:............................................ 65 Latihan 6: Sila dapatkan frekuensi variabel berikut .................................................... 67 Latihan 7: Cuba lakukan sendiri arahan DESCRIPTIVES bagi variabel numerikal seperti di bawah........................................................................................................................... 68 Latihan 8: Import file format Excel (yang terdapat dalam CD - nama file: Drug T Student) ke SPSS .............................................................................................................................. 70 Latihan 9: Cuba lakukan sendiri COPY & PASTE dengan menggunakan rajah dan jadual yang lain................................................................................................................................ 73 Latihan 10: Sila buat interpretasi jadual berikut ......................................................... 83 Latihan 11: Analisa file tbkkp.sav bagi pesakit tuberkulosis dengan menggunakan ilmu yang anda telah pelajari. ...................................................................................................... 89 Latihan 12: Cari p value bagi satu kajian uang berkaitan dengan obesiti di daerah setiu 93 Latihan 13: Cari corelasi antara berat badan ibu semasa mengandung dengan berat bayi semasa lahir. ................................................................................................................ 96 Latihan 14: Cari logistik regresion dalam masalah ibu merokok dan bayi SGA ........... 99
DR SUHAZELI BIN ABDULLAH Pakar Perubatan Keluarga Klinik Kesihatan Marang [email protected] Tel: 09-6182030 Fax: 09-6184485 Saturday, May 12, 2012
vi
Dr Suhazeli Abdullah FMS Klinik Kesihatan Marang Bandar Marang Terengganu Lawati laman web suhazeli-files.blogspot.com
Pengakuan
Persediaan untuk menyiapkan buku panduan ini di ambil dari beberapa sumber. Mungkin ianya sahih atau sebaliknya. Antara sumber utama yang saya ambil adalah dari buku panduan semasa bengkel SPSS (Sesi asas dan sesi Advance) yang di anjurkan oleh Unit Penyelidikan Perubatan, Fakulti Perubatan UKM. Nota kuliah oleh Dr Azmi Mohd Tamil dan Dr Mohd Rizal Abdul manaf. Selain dari itu saya merujuk kepada beberapa bengkel yang saya ikuti, antaranya; Minggu Penyelidikan UKM, Evidence Based Medicine Course (UMMC), Workshop on Research Network Development for WONCA Asia Pacific Region. Saya akan menerima segala pembetulan atau teguran yang berkaitan dengan kesalahan yang terdapat dalam buku panduan ini. Saya boleh dihubungi melalui email atau no telefon. Bersama dengan buku panduan ini saya sertakan nota latihan dalam bentuk cakera padat. Semoga anda semua mendapat manafaat daripadanya.
vii
viii
BAB II.
Asas Statistik
5. Data Mentah
Hasil cerapan yang belum diolah. 6. Saiz Sampel Jumlah individu terpilih sebagai sampel
7. Variabel
Ciri-ciri yang diukur dalam kajian.
8. Kualitatif
Data-data yang boleh diukur dalam bentuk nilai dan kumpulan
9. Kuantitatif
Data-data yang boleh diukur dalam bentuk aksara dan angka.
10. Dikotomus
Variabel yang terdiri dari dua pilihan. Seperti Ya/Tidak
11. Polinomial
Variabel yang terdiri dari banyak pilihan.
12. Bias
Ralat kajian yang boleh disebabkan oleh pengkaji, salah semasa pengumpulan data dan kesalahan format kajian.
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS
13. Konfounders
Ralat kajian yang disebabkan oleh sampel. Biasanya ralat ini tidak boleh diubah seperti, umur, jantina dan bangsa.
15. Nilai P
Satu nilai yang digunakan untuk membezakan nilai keraguan yang bermakna.
16. Mean
Purata nomber-nombor dalan satu bahan kajian. Biasanya digunakan bagi data bertabur normal (Normal distributio)
17. Median
"Middle value" of a list. The smallest number such that at least half the numbers in the list are no greater than it. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is equal to the sum of the two middle (after sorting) numbers divided by two. The median can be estimated from a histogram by finding the smallest number such that the area under the histogram to the left of that number is 50%.
18. Mode
For lists, the mode is the most common (frequent) value. A list can have more than one mode. For histograms, a mode is a relative maximum ("bump").
Apakah Epidimiologi?
Epidemiologi adalah kajian atas berapa kerap sesuatu penyakit berlaku dan kenapa. Maklumat epidemiologi digunakan untuk merancang dan menilai strategi bagi mencegah penyakit dan digunakan sebagai panduan bagi merawat penyakit Kajian epidimiologi akan dilakukan ke atas populasi yang berisiko. Untuk mengkaji subjek kepada semua populasi amat sukar. Justeru biasanya observasi kajian akan dilakukan ke atas sample kajian. Ia diambil dari populasi besar dengan beberapa kaedah tertentu. Target populasi study populasi Sample kajian Sampel Kajian
Populasi Kajian
Target Populasi
a)
Insiden
Insiden sesuatu penyakit adalah kadar kes baru berlaku dalam sesuatu masa dalam populasi. Contohnya, insiden thyrotoxicosis tahun 1998 adalah 10/100,000/tahun di Singapura berbanding dengan
49/100,000/tahun di Malaysia.
Secara kasarnya: Nombor kes baru Populasi berisiko X masa Atau Nombor kes baru Jumlah orang setahun yang berisiko
b)
Prevalen
Prevalen sesuatu penyakit adalah proposi (proportion) sesuatu populasi yang menjadi kes dalam sesatu masa tertentu. Contoh: Prevalen pernafasan berbunyi (wheezing child) dikalangan kanak-kanak di Sekolah rendah Malaysia dikaji pada tahun 1986 adalah dianggarkan 3%. Simptom pernafasan berbunyi ini berdasarkan jawapan ibubapa dalam kertas kajiselidik yang diedarkan. Prevalen adalah hanya pengukuran terbaik secara relatif. Tetapi ia tidak sesuai untuk penyakit-penyakit akut. Prevalen = Insiden x anggaran masa.
c) d)
Mortaliti
Mortaliti ialah insiden kematian dari sesuatu penyakit.
e)
Number of live births Fertility rate Number of women aged 15-44 years Number of infant (< 1 year) deaths Number of live births Number of intrauterine deaths after 28 weeks Total births Number of stillbirths + deaths in 1st week of life Total births NB These rates are usually related to one year
Stillbirth rate
a)
Exposed/unexposed Population
Populasi yang terdedah kepada penyakit menpunyai resiko untuk mengidap penyakit tersebut. Begitu juga sebaliknya. Tetapi sejauh mana kenyataan ini benar? Ia mesti dibuktikan dengan perkiraan statistik.
Contoh: Pekerja kilang arang menpunyai resiko tinggi untuk mengidap barah paru-paru kerana mereka terdedah secara langsung dengan arang batu. Berbanding dengan orang yang tidak bekerja dikilang arang, mereka tidak mempunyai resiko barah paru-paru kerana tidak terdedah. Walaubagaimana pun tidak semestinya semua orang yang terdedah akan mengidap penyakit barah paru-paru. Begitu juga sebaliknya.
b)
Attributable risk
Adalah kadar penyakit orang yang terdedah di tolak dengan orang yang tidak terdedah kepada penyakit. Ia mengukur kadar pendedahan seseorang akan terkena penyakit apabila terdedah kepada faktor resiko. Contohnya, untuk menentukan attributable risk orang yang terlibat dalam aktiviti lasak seperti mendaki bukit sedangkan mendaki itu adalah satu sukan yang menyeronokkan.
c)
Relative risk Is the ratio of the disease rate in exposed persons to that in people who are unexposed. It is related to attributable risk by the formula: Attributable risk= rate of disease in unexposed persons x (relative risk-1) Relative risk is less relevant to making decisions in risk management than is attributable risk. For example, given a choice between a doubling in their risk of death from bronchial carcinoma and a doubling in their risk of death from oral cancer,
most informed people would opt for the latter. The relative risk is the same (two), but the corresponding attributable risk is lower because oral cancer is a rarer disease. Nevertheless, relative risk is the measure of association most often used by epidemiologists. One reason for this is that it can be estimated by a wider range of study designs. In particular, relative risk can be estimated from case-control studies. Whereas attributable risk cannot. Another reason is the empirical observation that where two risk factors for a disease act in concert, their relative risks often come close to multiplying. Closely related to relative risk is the odds ratio, defined as The odds of disease in exposed persons The odds of disease in unexposed persons.
Confounding
In an ideal laboratory experiment the investigator alters only one variable at a time, so that any effect he observes can only be due to that variable. Most epidemiological studies are observational, not experimental, and compare people who differ in all kinds of ways, known and unknown. If such differences determine risk of disease independently of the exposure under investigation, they are said to confound its association with the disease. For example, several studies have indicated high rates of lung cancer in cooks. Though this could be a consequence of their work (perhaps caused by carcinogens in fumes from frying), it may be simply because professional cooks smoke more than the average. In other words, smoking might confound the association with cooking.
Confounding determines the extent to which observed associations are causal. It may give rise to spurious associations when in fact there is no causal relation, or at the other extreme, it may obscure the effects of a true cause. Two common confounding factors are age and sex. Crude mortality from all causes in males over a five year period was higher in Bournemouth than in Southampton. However, this difference disappeared when death rates were compared for specific age groups (Table 3.2). It occurred not because Bournemouth is a less healthy place than Southampton but because, being a town to which people retire, it has a more elderly population.
Table 3.2 Deaths in males in Bournemouth and Southampton during a five year period Bournemouth Age group (years) <1 1-44 45-64 65+ All ages Southampton
Annual Annual No of death rate No of death rate deaths Population per 100 000 deaths Population per 100 000 116 204 1 252 4 076 5 648 919 34 616 19 379 11 760 66 674 2 524 118 1 292 6 932 1 694 223 332 1 728 3 639 5 922 1 897 64 090 24 440 9 120 99 547 2 351 104 1 414 7 980 1 190
1. Standardisation
The above example shows the dangers of drawing aetiological conclusions from comparisons of crude rates. The problem can be overcome by comparing age and sex specific rates as in Table 3.2, but the presentation of such data is rather cumbersome, and it is often helpful to derive a single statistic that summarises the comparison while allowing for differences in the age and sex structure of the
populations under study. Standardised or adjusted rates provide for this need. Two techniques are available:
a)
Direct standardisation Direct standardisation entails comparison of weighted averages of age and sex specific disease rates, the weights being equal to the proportion' of people in each age and sex group in a convenient reference population. Table 3.3 shows the method of calculation, based on mortality from coronary heart disease in men in the USA aged 35-64 during 1968. Table 3.4 gives standardised rates for men and women in the ensuing years, calculated in the same way, and shows a remarkable fall.
Table 3.3 Example of direct standardisation, based on mortality from coronary heart disease (CHD) in men in the USA aged 35-64, 1968 CHD deaths/100 000 (1) 93 355 961 % of reference population in age group (2) (1) x (2) 34.4 360 29.5 100 3 199.2 12 780.0 28 349.5 443 28.7 443 100 =
Table 3.4 Coronary heart disease in American men and women aged 3564: changes in age standardised mortality (deaths/100 000/year) during 1968 - 1974 1968 Men Women 443 134 1969 430 126 1970 420 126 1971 413 124 1972 408 120 1973 399 118 1974 377 111
b)
Indirect standardisation The direct method is for large studies, and in most surveys the indirect method yields more stable risk estimates. Suppose that a general practitioner wants to test his impression of a local excess of chronic bronchitis. Using a standard questionnaire, he examines a sample of middle aged men from his list, and finds that 45 have persistent cough and phlegm. Is this excessive? The calculation is shown in.
Table 3.5 Example of indirect standardisation Age (years) 35-44 45-54 55-64 Total No in study Symptom prevalence in Expected cases = (1) (1) reference group (2) x (2) 150 100 90 8% 9% 10% 12 9 9 30 First the numbers of subjects in each age class are listed (column 1). The doctor must then choose a suitable reference population in which the class specific rates are known (column 2). (In mortality studies this would usually be the nation or some subset of it, such as a particular region or social class; in multicentre studies it could be the pooled data from all centres.) Cross multiplying columns 1 and 2 for each class gives the expected numberof cases in a group of that age and size, based on the reference population's rates. Summation over all classes yields the total expected frequency, given the size and age structure of that particular study sample. Where 30 cases were expected he has observed 45, giving
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS
10
an age adjusted relative risk or standardised prevalence ratio of 45/30 = 150%. (Conventionally, standardised ratios are often expressed as percentages.) A comparable statistic, the standardised mortality ratio (SMR) is widely used by the registrar general in summarising time trends and regional and occupational differences. Thus in 1981 the standardised mortality ratio for death by suicide in male doctors was 172%, indicating a large excess relative to the general population at the time. To analyse time trends, as with the cost of living index, an arbitrary base year is taken.
c)
Other methods of adjusting for confounders The techniques of standardisation are usually used to adjust for age and sex, although they can be applied to control for other confounders. Other methods, which are used more generally to adjust for confounding, include mathematical modelling techniques such as logistic regression. These assume that a person's risk of disease is a specified mathematical function of his exposure to different risk factors and confounders. For example, it might be assumed that his odds of developing lung cancer are a product of a constant and three parameters - one determined by his age, one by whether he smokes, and the third by whether he has worked with asbestos. A computer program is then used to calculate the values of the parameters that best fit the observed data. These parameters estimate the odds ratios for each risk factor - age, smoking, and exposure to asbestos, and are mutually adjusted. Such modelling techniques are powerful and readily available to users of personal computers. They should be used with caution, however, as the mathematical assumptions in the model may not always reflect the realities of biology
11
12
such as white spirit, mothers of malformed babies were questioned about their contact with such substances during pregnancy, and their answers were compared with those from control mothers with normal babies. With this design there was a danger that "case" mothers, who were highly motivated to find out why their babies had been born with an abnormality, might recall past exposure more completely than controls. If so, a bias would result with a tendency to exaggerate risk estimates. Another study looked at risk of hip osteoarthritis according to physical activity at work, cases being identified from records of admission to hospital for hip replacement. Here there was a possibility of bias because subjects with physically demanding jobs might be more handicapped by a given level of arthritis and therefore seek treatment more readily. Bias cannot usually be totally eliminated from epidemiological studies. The aim, therefore, must be to keep it to a minimum, to identify those biases that cannot be avoided, to assess their potential impact, and to take this into account when interpreting results. The motto of the epidemiologist could well be "dirty hands but a clean mind" (manus sordidae, mens pura). 3. Measurement error As indicated above, errors in measuring exposure or disease can be an important source of bias in epidemiological studies. In conducting studies, therefore, it is important to assess the quality of measurements. An ideal survey technique is valid (that is, it measures accurately what it purports to measure). Sometimes a reliable standard is available against which the validity of a survey method can be assessed. For example, a sphygmomanometer's validity can be measured by comparing its
13
readings with intraarterial pressures, and the validity of a mammographic diagnosis of breast cancer can be tested (if the woman agrees) by biopsy. More often, however, there is no sure reference standard. The validity of a questionnaire for diagnosing angina cannot be fully known: clinical opinion varies among experts, and even coronary arteriograms may be normal in true cases or abnormal in symptomless people. The pathologist can describe changes at necropsy, but these may say little about the patient's symptoms or functional state. Measurements of disease in life are often incapable of full validation. In practice, therefore, validity may have to be assessed indirectly. Two approaches are used commonly. A technique that has been simplified and standardized to make it suitable for use in surveys may be compared with the best conventional clinical assessment. A self administered psychiatric questionnaire, for instance, may be compared with the majority opinion of a psychiatric panel. Alternatively, a measurement may be validated by its ability to predict future illness. Validation by predictive ability may, however, require the study of many subjects.
Analysing validity
When a survey technique or test is used to dichotomise subjects (for example, as cases or non-cases, exposed or not exposed) its validity is analysed by classifying subjects as positive or negative, firstly by the survey method and secondly according to the standard reference test. The findings can then be expressed in a contingency table as shown below.
Table 4.1 Comparison of a survey test with a reference test Survey test result Reference test result Positive Negative Totals
14
Positive
True positives, correctly identified = (a) False negatives = (c) Total true positives = (a + c)
False positives = (b) True negatives correctly identified = (d) Total true negatives = (b + d)
Negative
Totals
From this table four important statistics can be derived: Sensitivity - A sensitive test detects a high proportion of the true cases, and this quality is measured here by a/a + c. Specificity- A specific test has few false positives, and this quality is measured by d/b + d. Systematic error - For epidemiological rates it is particularly important for the test to give the right total count of cases. This is measured by the ratio of the total numbers positive to the survey and the reference tests, or (a + b)/(a + c). Predictive value-This is the proportion of positive test results that are truly positive. It is important in screening. It should be noted that both systematic error and predictive value depend on the relative frequency of true positives and true negatives in the study sample (that is, on the prevalence of the disease or exposure that is being measured).
15
breast cancer alternative diagnostic criteria were compared with the results of a reference test (biopsy). Clinical palpation by a doctor yielded fewest false positives (93% specificity), but missed half the cases (50% sensitivity). Criteria for diagnosing "a case" were then relaxed to include all the positive results identified by doctor's palpation, nurse's palpation, or xray mammography: few cases were then missed (94% sensitivity), but specificity fell to 86%. By choosing the right test and cut off points it may be possible to get the balance of sensitivity and specificity that is best for a particular study. In a survey to establish prevalence this might be when false positives balance false negatives. In a study to compare rates in different populations the absolute rates are less important, the primary concern being to avoid systematic bias in the comparisons: a specific test may well be preferred, even at the price of some loss of sensitivity
1. Experimental Studies:
The hallmark of the experimental study is that the allocation or assignment of individuals is under control of investigator and thus can be randomized. The key is that the investigator controls the assignment of the exposure or of the treatment but otherwise symmetry of potential unknown confounders is maintained through randomization. Properly executed experimental studies provide the strongest empirical evidence. The randomization also provides a better foundation for statistical procedures than do observational studies.
a)
16
are randomly allocated to two or more treatment groups and the outcomes the groups are compared after sufficient follow-up time. Properly executed, the RCT is the strongest evidence of the clinical efficacy of preventive and therapeutic procedures in the clinical setting.
b)
c)
17
2. Observational Studies:
The allocation or assignment of factors is not under control of investigator. In an observational study, the combinations are self-selected or are "experiments of nature". For those questions where it would be unethical to assign factors, investigators are limited to observational studies. Observational studies provide weaker empirical evidence than do experimental studies because of the potential for large confounding biases to be present when there is an unknown association between a factor and an outcome. The symmetry of unknown confounders cannot be maintained. The greatest value of these types of studies (e.g., case series, ecologic, case-control, cohort) is that they provide preliminary evidence that can be used as the basis for hypotheses in stronger experimental studies, such as randomized controlled trials.
a)
18
b)
Case-Control Study:
A retrospective, analytical, observational study often based on secondary data in which the proportion of cases with a potential risk factor are compared to the proportion of controls (individuals without the disease) with the same risk factor. The common association measure for a casecontrol study is the odds ratio. These studies are commonly used for initial, inexpensive evaluation of risk factors and are particularly useful for rare conditions or for risk factors with long induction periods. Unfortunately, due to the potential for many forms of bias in this study type, case control studies provide relatively weak empirical evidence even when properly executed.
c)
d)
19
e)
Case Series:
A descriptive, observational study of a series of cases, typically describing the manifestations, clinical course, and prognosis of a condition. A case series provides weak empirical evidence because of the lack of comparability unless the findings are dramatically different from expectations. Case series are best used as a source of hypotheses for investigation by stronger study designs, leading some to suggest that the case series should be regarded as clinicians talking to researchers. Unfortunately, the case series is the most common study type in the clinical literature.
f)
Case Report:
Anecdotal evidence. A description of a single case, typically describing the manifestations, clinical course, and prognosis of that case. Due to the wide range of natural biologic variability in these aspects, a single case report provides little empirical evidence to the clinician. They do describe how others diagnosed and treated the condition and what the clinical outcome was.
20
a)
General Objective
states what is expected to be achieved by the study in general terms. General Objectives usually include:problem verification and analysis of the the causes of the problem Example: The study aims at assessing the relative importance of patients attitudes, knowledge of the health personnel about early symptoms and availability of diagnostic facilities as causes of delay in diagnosing pre-eclampsia.
b)
Specific Objectives
The break-down of general objective into smaller, logically connected parts. Systematically address various aspects of the problem. Also quantifying of problems distribution. Identification of possible contributory factors and expectation at end of study (WHAT THE STUDY HOPES TO ACCOMPLISH ) Specific Objectives have to be: concrete, and specific .Please use action verbs.
21
Example 1. The study aims at establishing the frequency of cancellation of elective operations. 2. The study aims at comparing the use of ultrasound in prenatal diagnostics by junior and senior physicians. 3. The study aims at computing the correlation between the patients age and the length of stay. Formulation of objectives Refer to problem analysis diagram (loose term as buble chart). Its cover the different aspects of the problem. And write/state in a logic sequence clearly phrased in operational terms (WHAT, WHERE and WHY) . Be realistic and not ambitious. It also must be measurable and use action verbs (to determine, compare, describe, establish, calculate )
Study Hypothesis
A statement that predicts the relationship between one or more factors and the problems being studied and test the prediction
2. Journals
Begitu juga dengan jurnal perubatan, boleh kita dapati dari perpustakaan yang melanggani jurnal tersebut. Kadang-kadang sesetengah perpustakaan tidak melanggani banyak jurnal. Menyebabkan pencarian bahan rujukan kita terhad.
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS 22
Tapi jangan bimbang, kebanyakan jurnal yang tersohor sekarang ini boleh kita dapati melalui laman web mereka. Ada yang boleh mendapat bahan secara percuma dan ada yang kena berlanggan.
3. Internet
Pencarian bahan rujukan melalui internet telah mengubah kehidupan pengkaji atau penyelidik. Sebelum ini mereka mereke bergantung kepadaperpustakaan besar atau perpustakaan perubatan di Universiti Perubatan. Tetapi sekarang dengan menggunakan hujung jari dan duduk depan komputer yang berinternet, penyelidik boleh mencari bahan rujukan dengan sekelip mata sahaja. Di sini turunkan secara ringkas bagaimana cara mencari bahan rujukan melalui internet.
a)
rujukan perubatan.
Name American Family Physician Archive of Family Medicine Australian Family Physician British Medical Journal Canadian Medical Journal Centre of EBM CPG Singapore McMaster EBM PubMed Central http://www.cebm.net/ http://moh.gov.sg/pub/cpg/cpg.htm http://www.cche.net/principles/content_all.asp http://www.pubmedcentral.nih.gov/ http://www.cmaj.ca/ http://bmj.bmjjournals.com/ http://www.racgp.org.au/publications/afp_online.asp http://www.racgp.org.au/ Site http://www.aafp.org/
23
CPG New Zealand Global Family Doctor Malaysian Medical Association PCDOM Primary Care Clinical Guidelines SIGN Postgraduate Medicine Mayo Clinic AcadMed Malaysia Malaysian Medical Resources WHO Bandolier National Guideline Clearinghouse Free Medical Journals Evidence-based Nursing Journal of Community Nursing
http://www.freemedicaljournals.com/ http://ebn.bmjjournals.com/contents-by-date.0.shtml
http://www.jcn.co.uk/journal.asp?showArt=no
b)
Resource
24
Resource
Name
Site http://www.google.com/
Good internet search engine Goggle with better safety features (Google also offer a desktop search of all material in PC) Free, search-based webmail service with large storage Gmail 1,000 megabytes (1 gigabyte) storage Software that helps you instantly find, edit and share all the piclures on your PC. Picasa - A free software download from Google Good search engine for scientific (academic) papers & results Free resource for Physicians, with customised CME, medical journal articles, MEDLINE search, medical news, etc Customised new references Biomail from MEDLINE to your email account Up-to-date, accurate information about effects of healthcare. Systematic Reviews of helthcare National Library of Medicine Cochrane Collaboration Medscape (also for medline search) Goggle Scholar
http://gmail.google.com/
http://scholar.google.com/
http://www.medscape.com/
http://www.biomail.org/
http://www.coch ra ne .org
http://www.nlm.nih.gov/database/
25
Resource interventions.
http://www.cche.net/
Web page run by colleagues Malaysian Medical to support colleagues & clients (Alan Teh aka Palmdoc, TE Cheah) Resources (also connects with Palm doc & gives ideas how to use a plam)
4. Compact Disc
Boleh didapati apabila anda membeli buku rujukan seperti, Harrison medical Textbook.
5. Medline
Pencarian maklumat melalui medline adalah pencaraian yang paling berjaya setakat ini. Ini adalah kerana medline boleh mencari perkataan perubatan yang spesifik dan menepati kehendak pengguna. Anda boleh menjana pencarian medline secara percuma melalui laman web Pubmed Central dari National Center for Biotechnology Information. Ia mengandungi bahan jurnal yang percuma, lebih 300,000 bahan dari 150 jurnaldalam talian. Terpenting di sini adalah ia percuma untuk orang awam. Sila lawatlaman web ini untuk anda memulakan pencarian. (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PMC)
26
a) b)
How to do study
Epidemiological surveys use various study designs and range widely in size. At one extreme a case-control investigation may include fewer than 50 subjects, while at the other; some large longitudinal studies follow up many thousands of people for several decades. The main study designs will be described in later chapters, but we here discuss important features that are common to the planning and execution of surveys, whatever their specific design.
28
1. Early planning The success of data collection requires careful preparation. The first and often the most difficult question is "Why am I doing this survey?" Many studies start with a general hope that something interesting will emerge, and they often end in frustration. The general interest has first to be translated into precisely formulated, written objectives. Every survey should be reasonably sure to give an adequate answer to at least one specific question. This initial planning requires some idea of the final analysis; and it may be useful at the outset to outline the key tables for the final report, and to consider the numbers of cases expected in their major cells. Every study needs a primary purpose. It is easy to argue "While we have the subjects there, let's also measure..."; but overloading, whether of investigators or subjects, must be avoided if it in any way threatens the primary purpose. Sometimes subsidiary objectives may be pursued in subsamples (every nth subject, or in a particular age group) or by recalling some subjects for a second examination: when their initial contact has been favourable then response to recall is usually good. 2. Background reading Before planning the detail of a study, it is wise to carry out a library search of the relevant background publications. Occasionally this may show the answer to the study question without any need for further data collection; or it may uncover useful sources of published information, such as the registrar general's mortality and cancer registry reports, which can form the basis of an analysis without the requirement for an expensive and time consuming field survey. Even when survey work remains necessary, experience in earlier related investigations may guide the design or indicate pitfalls to be avoided.
29
30
written direct into the coding boxes. Others, such as occupation, may need to be recorded in words and coded later as a separate exercise. Time spent writing is minimised if non-numerical information is, when possible, ringed or ticked rather than having to be written out. To minimise the chance of error, any reformulation of numerical data (for example, derivation of age at hospital admission from date of birth and date of admission) should be carried out by the computer after date entry, and not as part of the abstraction process. When coding data, allowance must be made for the possibility of missing information.
5. Questionnaires
Epidemiological data are often obtained by means of questionnaires. These may be either self administered (that is, completed by the subject) or administered at interview. Self administered questionnaires are easier to standardise because the possibility of systematic differences in interviewing technique is avoided. On the other hand, they are limited by the need to be unambiguously understood by all subjects. An interviewer may be essential to collect information on complex topics. Good design of questionnaires requires skill. The language used should be clear and simple. Two short questions, each covering one point, are better than one longer question which covers two points at once. A question that has been used successfully in a previous study has obvious advantages. The order of questions should take into account the sensitivities of the person to whom they are addressed - it is better to start with "What is your date of birth?" than launch straight into "Have you ever been treated for gonorrhoea?" - and should be designed to facilitate recall. For example, all questions relating to one phase of the person's life might be grouped together. As a check on the reliability of information, it may sometimes be helpful to include overlapping questions. In a study of risk factors for back pain, some people reported that their jobs entailed driving for more than four hours a day but did not involve more than two hours sitting. This suggests that they had not properly understood the questions. An
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS 31
important consideration is whether to use closed or open ended questions. Closed ended questions, with one box for each possible answer (including "don't know") are more readily answered and classified, but cannot always collect information in the detail that is required. When interviewers are used then the wording with which they ask questions should be standardised as far as is compatible with the need to obtain useful information. As in abstracting existing records, the forms used to record answers to questions should be designed for ease and accuracy of completion and to simplify subsequent coding and analysis.
32
of the main study: new staff need supervised practice under realistic field conditions followed by pre-survey testing.
Despite all precautions, observer differences may persist. Observers should therefore be allocated to subjects in a more or less random way: if, for example, one person examined most of the men, and another most of the women, then observer differences would be confounded with true sex differences. To maintain quality control throughout the survey each examiner's identity should be entered on the record, and results for different examiners may then be compared.
Contoh 1: Di klinik anda mempunyai masalah asthma yang tinggi di kalangan kanak-kanak pada tahun lepas. Bagaimana ingin menurunkan kadar kejadian kes pada tahun ini. Jadi
33
cara nak menjadikan ia menjadikan ia satu tajuk kajian dengan menggunakan kaedah PICO ini adalah seperti berikut: P= Kanak-kanak asthma yang datang ke Klinik Kesihatan Permaisuri. I= Pendidikan kesihatan C= Control dengan pesakit yang tidak mendapat pendidikan kesihatan. O= Megurangkan insiden athma pada tahun ini. Mencari bahan yang berkaitan dengan asthma dan kaedah pendidikan kesihatann serta keberkesanannya. Mencari pakar rujuk yang boleh dibawa berbincang dan mengkritik proposal kajian kita. Mempastikan kajian yang akan dilakukan bersesuaian dengan keadaan semasa.
Topik yang diolah: Case control study about Asthma Education among children attending Permaisuri Health Center From Jun to December 2008.
Contoh 2 Diperhatikan jumlah neonatal jaundice tahun lepas naik secara mendadak di Daerah Setiu. Apakah tajuk kajian yang sesuai dilakukan. Penyiasatan ringkas kita menunjjukan peningkatan tersebut akibat dari lambat notifikasi kes. Adaskah ia benar wujud atau tidak, belum pasti. Kita juga ingin mengetahui faktor lain yang mungkin terlibat. Berdasarkan mnemonik yang diberikan di atas kita boleh mengolah tajuk kajian yang kita akan lakukan. P= Neonate yang dilahirkan di Daerah Setiu I= Tiada C= Semua jumlah kelahiran semasa tarikh kajian . O= Mengetahui faktor sebenar punca NNJ Tajuk yang boleh diolah: Cross Sectional study on High Incidence of Neonatal Jaundice in District Setiu from June to December 2008.
34
Pengumpulan Data
1. Jenis Data
a)
Data Primer
Data yang didapati daripada sumber asal, iaitu diukur secara langsung dari populasi asal. Didapatkan dengan pemeriksaan atau pengukuran langsung, temubual atau rekod kesihatan.
b)
Data Sekunder
Data yang didapati daripada sumber kedua yang mengalami pengolahan seperti laporan tahunan.
2. Jenis Variabel
Dependent and Independent variables
In most causeand effectstudy the researcher is looking at the relationship between Independent variables and the dependent variables the effect/outcome dependent variable the cause is an independent variable example: in a survey to investigate whether there is a relationship between mothers smoking cigarettes and weight of newborn: The dependent variable is the newborns weight The independent variable is the mothers smoking habit If a researcher looks for a causal explanation,the characteristic of problem under study may be called the DEPENDENT VARIABLE OR OUTCOME The INDEPENDENT VARIABLES are the characteristics of factors that are assumed to cause or influence the problem
35
The variables to be studied are selected on the basis of their relevance to the objectives of the study. A variable may be Indepedent or dependent according to the objectives of the study.The decision on which variables are IV or DV follows from the statement of the problem example 1: if a researcher investigates whether smoking causes lung cancer, therefore ,lung cancer is the dependent variable But if he investigate why people smoke, then smoking would be the dependent variable. example 2: if a researcher wants to investigates whether the poor quality of hospital food influence patient for not taking fhe hospital diet,then the number of patients not taking foodsis the dependent variable But if he wants to investigate why poor quality of hospital food ,then the quality of food would be the dependent variable. The problem to be studied is measured in terms of dependent variables. The factors influencing the problem are measured in terms independent variables. Confounding variable A variable that is associated with the problem and with the possible cause of a problem. This variable may either strengthen or weaken the apparent relationship between an outcome and a possible cause
Cause (Independent)
Effect/outcome (dependent)
Other factors (Confounding) example: A relationship is shown between the low level of the mothers Education (IV) and malnutrition (DV)in under-fives. However family income is related to the mothers education as well as with malnutrition
36
3. Pengukuran Variebel
a)
Kualitatif
Dikategorikan berdasarkan sifat atau ciri yang membezakannya. Seperti; Etnik : Melayu, Cina, India dll. Boleh dibahagikan lagi kepada jenis nominal & Ordinal. Nominal- Tidak mempunyai nilai urutan/susunan tertentu seperti etnik M, C, I & L Ordinal- Ada nilai susunan atau aturan tertentu antara kategori. Jarak nilai antara kategori tidak di ketahui seperti pangkat jawatan. Nama lain: Kategorikal data.
b)
Kuantitatif
Hasil cerapannya berangka dan didapatkan dengan mengukur, menyukat atau membilang. Terdiri dari 2 jenis Diskret -Hasil dari membilang, dalam angka bulat. Cth bilangan anak/isteri. Selanjar -Boleh mengambil nilai pecahan, hasil dari pengukuran seperti tekanan darah, paras hemoglobin. Nama lain: Numerical data
a)
Conceptual definition
To define as it is concieved. example: obesity is defined as excessive fatness, overweight etc.
37
b)
Operational definition
(Working definition) the characteristics the investigator will actually measure example: obesity is defined as a weight based on weighting in under clothes and without shoes. other example in appendix 1 Operational definition of a variable forces the investigator to consider practicability Example in Appendix 2 a number of questions which may arise when attempting to define variables
Sampling Method
1. Sample size
Most surveys and trials are smaller than the investigator would wish, lack of numbers often setting a limit to some desirable subgroup analysis. This is inevitable. What can be avoided is discovering only at the final analysis that numbers do not permit achievement even of the study's primary objective. To prevent this disappointment the purpose of the study has first to be formulated in precise statistical terms. If the aim is to estimate prevalence, then sample size will depend on the required accuracy of that estimate. (Table 5.1 gives some examples.) Sampling error is proportionally greater for less common conditions; that is to say, to achieve the same level of confidence requires a larger sample if prevalence is low. Table 5.1 95% confidence limits for various rates and sample sizes 95% confidence limits Estimated prevalence (%) 2 n=500 1.0-3.7 n=1000 1.2-3.1
38
10 20
7.5-13.0 16.6-23.8
8.2-12.0 17.6-22.6
Techniques also exist for calculating sample sizes required for estimating, with specified precision, the mean value of a variable, or for identifying a given difference in prevalence or mean values between two populations. These techniques may be found in textbooks or (better) by consulting a statistician; but either way the investigators must first know exactly what they want to achieve.
2. Sampling methods
There are 2 main terms categories in doing sampling: o Non-probability Sampling o Probability Sampling
a)
Non-probability Sampling
Convenience Sampling Sample that happens to be available at the time or period of the research is selected, for conveniences sake. The sample may not be representative for the population understudied Quota Sampling The sample choose from the available source at that time until the investigator quota fulfilled. This method only useful when the convenience sample would not provide the desired balance of elements in population
Note: Non-probability Sampling is unable to quantify variable and generalize the finding to the population
b)
Probability Sampling
Employs random sampling procedures to ensure that the sampling unit is selected on the basis of chance. Every member of the population have a known chance of being included in the sample
39
3. How to do Sampling
a)
Matching:
When confounding cannot be controlled by randomization, individual cases are matched with individual controls that have similar confounding factors, such as age, to reduce the effect of the confounding factors on the association being investigated in analytic studies. Most commonly seen in case-control studies.
b)
Restriction (Specification):
Eligibility for entry into an analytic study is restricted to individuals within a certain range of values for a confounding factor, such as age, to reduce the effect of the confounding factor when it cannot be controlled by randomization. Restriction limits the external validity (generalizability) to those with the same confounder values.
c)
Census:
A sample that includes every individual in a population or group (e.g., entire herd, all known cases). A census not feasible when group is large relative to the costs of obtaining information from individuals.
d)
Haphazard, Convenience, Volunteer, Judgmental Sampling: Any sampling not involving a truly random mechanism. A hallmark of this form of sampling is that the probability that a given individual will be in the sample is unknown before sampling;. The theoretical basis for statistical inference is lost and the result is inevitably biased in unknown ways. Despite their best intentions, humans cannot choose a sample in a random fashion without a formal randomizing mechanism.
40
e)
f)
Random Sampling:
Each individual in the group being sampled has a known probability of being included in the sample obtained from the group before the sampling occurs.
g)
h)
41
complicated statistical procedures (such as Mantel-Haenszel) in which the stratification is taken into account.
i)
Cluster Sampling:
Staged sampling in which a random sample of natural groupings of individuals (houses, herds, kennels, households, stables) are selected and then sampling all the individuals within the cluster. Cluster sampling requires special statistical methods for proper analysis of the data and is not advantageous if the individuals are highly correlated within a group (a strong herd effect).
j)
Systematic Sampling:
From a random start in first n individuals, sampling every nth animal as they are presented at the sampling site (clinic, chute, ...). Systematic sampling will not produce a random sample if a cyclical pattern is present in the important characteristics of the individuals as they are presented. Systematic sampling has the advantage of requiring only knowledge of the number of animals in the population to establish n and that anyone presenting the animals is blind to the sequence so they cannot bias it.
4. Recruiting subjects
Most people are willing to take part in medical surveys provided that they trust the investigators, just as patients will nearly always help their own doctors in their research. In population studies, however, there has usually been no previous contact. The selected subjects need an explanation of the purpose of the study, of why they in particular have been asked to take part, of what is expected from them, and what if anything they will get out of it (for instance a medical check up or a report on the research findings). Local general practitioners, too, need to
42
know what is going on. Time given to preparatory public relations is always well spent. Response must be made as easy as possible. If attendance at a centre is required, it is better to send everyone a provisional appointment than to expect them to reply to a letter asking whether they are willing to attend. Provision of transport may be welcomed. Often the difference between a mediocre response and a good one is tactful persistence, including second invitations (perhaps by recorded delivery), telephone calls, identifying the reasons for non-attendance, and home visits.
5. Response rates
The level of response that is acceptable depends both on the study question and on the population in which the question is being asked. Problems arise because non-responders may be atypical. For example, in a survey of coronary risk factors among adults registered with a group practice, those at highest risk may be the least inclined to complete a questionnaire or attend for examination. If a response rate of 85% were achieved, an estimated prevalence of heavy alcohol consumption of 3% among the responders could be substantially too low if most of the nonresidents drank heavily. On the other hand an estimated 50% prevalence of smokers would not need major revision, even if all of the nonresponders smoked. What matters is how unrepresentative non-responders are in relation to the study question. It is not important whether they are atypical in other respects. In a survey to evaluate the association between serum IgE concentrations and ventilatory function it would not matter if non-responders had an unusually high frequency of respiratory disease, provided that the relation of their ventilatory function to IgE was not unrepresentative. Assessment of the likely bias resulting from incomplete response is ultimately a matter of judgement. However, two approaches may help the assessment. Firstly, a small random sample can be drawn from the non-responders, and particularly
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS 43
vigorous efforts made to encourage their participation, including home visits. The findings for this subsample will then indicate the extent of bias among nonresponders as a whole. Secondly, some information is generally available for all people listed in the study population. From this it will be possible to contrast responders and non-responders with respect to characteristics such as age, sex, and residence. Differences will alert the investigator to the possibility of bias. In addition, it may help to put absolute bounds on the uncertainty arising from non-response by making extreme assumptions about the non-responders. For example, if the aim of a survey were to estimate a disease prevalence, what would be the prevalence if all of the non-responders had the disease, or none of them?
6. Analysis
Small studies can sometimes be analysed manually with the help of a calculator. Nowadays, however, the analysis of epidemiological data is almost always carried out by computer. With recent advances in technology, all but the largest data sets can be handled satisfactorily on a personal computer. Moreover, a wide range of software packages is now available to assist epidemiological analysis. The starting point for analysis by computer is the coding and entry of data. These procedures should be checked, usually by carrying them out in duplicate. In addition, once the data have been entered, further checks should be made to ensure that all codes are valid (for example, nobody should have 31 February as a birth date) and to look for any internal inconsistencies (such as a date of admission to hospital being earlier than the subject's date of birth). Statistical analysis should only begin when the data set is as "clean" as possible. With the ready availability of software packages, it is tempting for medical investigators to embark on analyses they do not fully understand, and in the process they may use inappropriate statistical techniques. For this reason it is preferable to obtain advice from a statistician when carrying out all but the simplest analyses. As with the earlier stages of data processing, statistical calculations should all be checked.
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS 44
Statistik Inferens
Apabila kita melakukan sesuatu penyelidikan, kita mahu membuat sesuatu inferens dari data yang terkumpul, contohnya "Ubat A lebih baik dari ubat B dalam merawat sesuatu penyakit C" maka Hipotesis Nul akan berbunyi seperti berikut; "tiada perbezaan keberkesanan di antara ubat A dengan ubat B dalam merawat penyakit C" jadi apabila dilakukan statistik inferens, dapat ditentukan sama ada wujud atau tidak perbezaan yang signifikan dari segi keberkesanan di antara ubat A dan ubat B. Jika wujud perbezaan yang bermakna, maka hipotesis nul akan ditolak, iaitu wujud perbezaan keberkesanan yang signifikan antara 2 ubat tersebut (p<0.05). Sebaliknya jika tidak wujud perbezaan yang bermakna, maka hipotesis nul tidak ditolak iaitu tiada perbezaan keberkesanan yang signifikan di antara ubat A dengan ubat B dalam merawat penyakit C (p>0.05). Biasa batas kemaknaan yang digunakan sama ada untuk menolak atau tidak hipotesis nul ditentukan pada 0.05 atau 0.01. Bagi contoh di atas ia ditentukan pada 0.05. Selang keyakinan pula ialah 1-batas kemaknaan. Jika batas kemaknaan 0.05 maka selang keyakinan adalah 95%. Cara mengira nilai p kiraan bagi statistik inferens akan dijelaskan selepas ini.
45
Jenis-jenis Ujian bagi Data Kuantitatif Parametrik Ujian T Independent (Student's T-Test) Ujian T berpasangan ANOVA Korelasi & Regresi Non-parametrik Wilcoxon Rank Sum test Mann Whitney test Kruskal Wallis
2. Ralat
Walaupun telah ditetapkan batas kemaknaan dan selang keyakinan, masih lagi timbul kemungkinan ralat. Ada 2 jenis ralat iaitu Ralat Jenis I dan Ralat Jenis II. Keadaan Sebenar Kesimpulan Ujian Kemaknaan Hipotesis Nul Tidak Ditolak Hipotesis Nul Ditolak Hipotesis Nul Benar (Ho tidak ditolak) Kesimpulan Benar Ralat Jenis I Hipotesis Nul Tidak Benar (Ho ditolak) Ralat Jenis II Kesimpulan Benar
46
Ralat Jenis I - menolak hipotesis nul sedangkan hipotesis ini adalah benar (e.g. didapati bahawa apabila dibandingkan nilai min/perkadaran, wujud perbezaan yang kecil tetapi perbezaan itu didapati signifikan. Oleh itu hipotesis null ditolak. Mungkin disebabkan oleh masalah seperti saiz sampel terlalu besar) Ralat Jenis II - tidak menolak hipotesis nul sedangkan hipotesis ini salah (e.g. didapati bahawa apabila dibandingkan nilai min/perkadaran, didapati wujud perbezaan tetapi perbezaan itu didapati tidak signifikan. Oleh itu hipotesis null tidak ditolak. Mungkin disebabkan oleh masalah seperti saiz sampel terlalu kecil)
47
BAB IV.
Bagaimana bermula
Pembelajaran ini boleh dilakukan 2 cara, iaitu secara online atau melakukan latihan dengan menggunakan nota yang telah disediakan. Untuk mendapatkan nota tersebut, anda boleh download fail-fail berikut dan cetakkannya menggunakan printer (format Adobe Acrobat *.pdfsila lawat laman web http://161.142.92.99/hululangat/stat/). Anda juga dibekalkan dengan beberapa fail tambahan untuk dijadikan bahan latihan. Bahan tersebut adalah sga.sav, sga.dbf. Anda juga boleh mendapat nota tambahan dari laman web http://www.iiumedic.net/biostatistics/v1/ yang ditulis oleh Dr Jamaluddin Abdul Rahman, Pensyarah Biostatistik UIA Kuantan1.
48
Baris teratas sekali adalah nama variabel, data individu kemudiannya disusun berturutan dibawah. Selepas itu barulah data ini dimasukkan ke dalam perisian SPSS. Sebelum data dimasukkan, perlulah kita menyediakan tempat variabel tersebut di dalam SPSS. Senaraikan nama-nama variabel tersebut terlebih dahulu dan jenisnya sama ada kategorikal (string dalam SPSS) atau numerikal (numeric dalam SPSS). Nama-nama variabel itu hendaklah mengikuti syarat-syarat berikut;
Unik berbeza antara satu sama lain Hanya 8 huruf atau kurang Hanya menggunakan alphanumeric, tiada simbol seperti %.,*& atau SPACE Mempunyai makna tertentu agar mudah difahami e.g. n1rekod yang memberi makna soalan pertama mengenai nombor rekodnya.
Penkodan bagi setiap variabel juga hendaklah ditentukan terlebih dahulu (e.g. bagi etnik M=Melayu, C=Cina etc) bagi data yang ingin dimasukkan.
3. Bawa cursor ke Menu DATA > DEFINE VARIABLE (Atau bawa cursor ke atas baris nama variabel pertama, right-click dan pilih DEFINE VARIABLE). Requester
50
4. Masukkan nama variabel di petak Variable Name. Bagi contoh ini, masukkan "norekod". Selepas itu klik pada button Type. Requester berikut akan kelihatan.
5. Memandangkan variabel norekod hanyalah variabel identifier dan tidak akan dianalisa, pilih jenis string dan bilangan character sebagai 3 (kerana jumlah cerapan kes ialah 218 orang, maka perlu 3 petak). Klik CONTINUE. Pada requester sebelumnya, klik COLUMN FORMAT pula. Requester berikut akan kelihatan.
51
6. Isikan COLUMN WIDTH sebagai 8 dan TEXT ALIGNMENT sebagai CENTER. Ini akan memudahkan kita semasa memasukkan data kelak. Selepas itu klik pada OK. Variabel yang tertera di DATA EDITOR adalah seperti berikut.
7. Lakukan perkara yang sama bagi variabel seterusnya (rujuk kepada lampiran data) iaitu;
Latihan 1: Memasukkan nama variable dan jenis dalam SPSS spreadsheet
Sila masukkan nama variable dan jenis ke dalam SPSS spreadsheet Variable Name Age Race Residenc Marital Education Typework Type Column Bilangan Formatting (Decimal) (Width) 3 1 4 1 8 0 7 0 8 1 8 1
LATIHAN 1
52
3. Masukkan perkataan RACE dalam petak VARIABLE LABEL. Pada petak VALUE, masukkan nilai 1. Kemudian masukkan perkataan MALAY dalam petak VALUE LABEL. Tekan butang ADD. Lakukan yang sama bagi 2=CHINESE, 3=INDIAN dan
53
4=OTHERS.
Hasil
akhir
patutnya
sedemikian.
4. Tekan butang CONTINUE dan kemudian butang OK. 5. Sebagai percubaan masukkan nilai 1, 2, 3 dan 4 pada kolum RACE seperti rajah dibawah.
7. Data tadi akan kelihatan sedemikian. Inilah gunanya label. Label yang sama akan digunakan dalam jadual, rajah dan apa jua hasil yang diterbitkan dari variabel ini. Oleh itu lebih baik anda menggunakan label sepertimana yang anda inginkan ia akan tertera dalam laporan akhir kelak (eg English atau Bahasa Malaysia) kerana
54
rajah atau jadual yang terhasil boleh ditampal (paste) terus dari SPSS ke word processor seperti Word 2003.
Latihan 2: Latihan melengkapkan variable dan label dalam SPSS spreadsheet Sebagai latihan, lengkapkan label-label berikut.
Variabel Marital
Label 0=single 1=married 2=divorced/widowed 1=Nil 2=Primary 3=Secondary 4=Tertiary 1=Housewife 2=Office work 3=Fieldwork
LATIHAN 2
Education
Typework
55
1. Compute
1. Buka fail tersebut dengan mengklik pada menu FILE>OPEN. Tukarkan ke directory ke (CD:)>bahan>latihan dr azmi>. Pilih fail sga.sav dan klik OPEN. (Jika data disimpan dalam CD:) 2. Kini kita akan menghasilkan satu variabel baru iaitu BMI (Body Mass Index) dari variabel WEIGHT1 (berat semasa trimester pertama) dan variabel HEIGHT (tinggi responden). Formula BMI adalah berat (kg)/tinggi2 (m2). 3. Klik pada menu TRANSFORM>COMPUTE (seperti dalam rajah).
4. Requester COMPUTE VARIABLE akan tertera. Lengkapkannya seperti rajah di bawah. Lepas tu klik OK.
56
5. Sekarang lihat pada DATA EDITOR, akan kelihatan variabel baru BMI yang terhasil (anda mungkin terpaksa scroll ke kanan).
Latihan 3: Sila buat pengiraan mengikut formula yang anda ketahui bagi:
LATIHAN 3
i. Menukarkan berat bayi dari KG ke bentuk Gram ii. Menukarkan berat badan ibu semasa semester pertama dari KG ke bentuk Gram dan iii. sila buat pengiraan perbandingan berat badan ibu semasa semester pertama dengan berat bayi.
2. Recode
1. Kini kita akan recode AGE (umur) dari data selanjar kepada AGEGROUP (kumpulan umur) iaitu <=20, 21-30, 31-40 dan >40. 2. Klik pada menu TRANSFORM>RECODE>INTO DIFFERENT VARIABLES.
57
3. Dalam requester yang terhasil, pilih AGE dari petak kiri dan tekan pada ARROW ke kanan. Kemudian isikan AGEGROUP dalam petak OUTPUT VARIABLE:NAME dan klik pada CHANGE. Ianya akan kelihatan seperti di bawah.
4. Sekarang klik pada butang OLD AND NEW VALUES. Requester berikut akan tertera.
5. Pilih seperti di atas dan klik ADD. Tukar 21-30 kepada VALUE 2, 31-40 kepada VALUE 3, 41 THROUGH HIGHEST kepada VALUE 4. Apabila selesai, ianya akan kelihatan seperti di bawah. Tekan CONTINUE dan kemudian OK.
58
6. Apabila discroll ke kanan, akan kelihatan variabel baru iaitu AGEGROUP. Bagi melengkapkan langkah ini, masukkan label melalui DATA>DEFINE VARIABEL bagi AGEGROUP. Labelnya ialah 1= "less than 21 years", 2="21 to 30 years", 3="31 to 40 years" dan 4=">40 years". Latihan 4: Sila recode beberapa variable dibawah: i. Recodekan AGE sekali lagi kepada variabel baru iaitu AGERISK yang terdiri dari 2 kumpulan iaitu mereka yang berumur 19 hingga 35 tahun (dikodkan sebagai 0) dan berumur 36 tahun ke atas (dikodkan sebagai 1). ii. Recodekan BMI kepada kategori berikut: BMI < 18 18.1 25.0 25.1 27.0 27.1 30.0 30.1 35.0 >35 Kod 1 2 3 4 5 6 Kategori Low BMI Normal BMI Overweight Obese type 1 Obese type 2 Morbidly obese
LATIHAN 4
iii. Recodekan Hemoglobin 2 (lowest hb at 2) kepada kategori berikut; kurang dari 9 (kod 1) anemia teruk, 9 hingga 11.0 (kod 2) untuk anemia sederhana dan lebih dari 11 (Kod 3) normal. iv. Bagaimana kita ingin recode data yang missing atau tiada input? Contoh: Variable Reflolux
59
Menjelajah (Exploring)
In exploring your data, you will be producing summary statistics and graphical displays, either for all the collected data or separately for groups of cases. There are many reasons why you would want to explore your data, among them are
Data screening Outlier identification Description Assumption checking Identifying characterizing differences among groups of cases (subpopulations)
Data screening may show that you have unusual values, extreme values, gaps in the data or other peculiarities. By exploring the data, it can help determine whether
the statistical techniques chosen would be appropriate you need to transform the data prior to analysis you may need to conduct non-parametric tests
Among the statistical output and plots that would help in exploring the data are;
Mean, median, 5% trimmed mean, standard error, variance, standard deviation minimum, maximum, range, interquartile range, skewness and kurtosis and their standard errors, confidence interval for the mean (and specified confidence level), percentiles
Hubers M-estimator, Andrews wave estimator, Hampels redescending Mestimator, Tukeys biweight estimator, the five largest and five smallest values, the Kolmogorov-Smirnov statistic with a Lilliefors significance level for testing normality, and the Shapiro-Wilk statistic.
60
Boxplots, stem-and-leaf plots, histograms, normality plots, and spread-versuslevel plots with the Levene test and transformations.
Select one or more factor variables, whose values will define groups of cases.
Select an identification variable Sila pilih variabel INDEX. to label cases. Click Statistics for robust estimators, outliers, percentiles, and frequency tables. Click Plots for histograms, normal probability plots and tests, and spread-versus-level plots with Levenes statistic. Click Options for the treatment of missing values.
61
Selepas itu tekan OK. Berikut adalah antara hasil yang akan kelihatan pada tetingkap DATA OUTPUT. Explore SGA
62
63
AGE Histograms
Stem-and-Leaf Plots AGE Stem-and-Leaf Plot for CASE= Normal Frequency Stem & Leaf 1.00 1 . 9 5.00 2 . 01111 9.00 2 . 222233333 15.00 2 . 444444555555555 17.00 2 . 66666666666677777 10.00 2 . 8888899999 14.00 3 . 00000000111111 8.00 3 . 23333333 9.00 3 . 444445555 7.00 3 . 6667777 6.00 3 . 888889 3.00 4 . 111 1.00 4 . 3 2.00 4 . 44 1.00 4 . 6 Stem width: 10 Each leaf: 1 case(s)
64
Berdasarkan dari hasil ini dapat dilihat tentang taburan data yang ada, sama ada ianya normal atau tidak, ada atau tidak nilai outlier dan sebagainya.
Latihan 5: Sila explore variabel-variabel seperti berikut:
i. WEIGHT1 sebagai Dependent Variable. ii. WEIGHT2 sebagai dependent Variable. iii. WEIGHTG1 sebagai dependent Variable iv. WEIGHTG2 sebagai dependent Variable v. WEIGHTG3 sebagai dependent Variable Bincangkan hasil yang anda perolehi. Sudah pasti anda akan menemui sesuatu yang menyeronokkan!
LATIHAN 5
Frekuensi (Frequency)
The Frequencies procedure provides statistics and graphical displays that are useful for describing many types of variables. For a first look at your data, the Frequencies procedure is a good place to start. For a frequency report and bar chart, you can arrange the distinct values in ascending or descending order or order the categories by their frequencies. The frequencies report can be suppressed when a variable has many distinct values. You can label charts with frequencies (the default) or percentages.
65
Pilih FREQUENCIES sekali lagi. Tekan butang RESET. Kini kita akan memilih variabel numerikal pula iaitu variabel NTVISIT (bilangan lawatan antenatal). DESELECT petak DISPLAY FREQUENCIES TABLE.
Click Statistics for descriptive Pilih MEAN, MODE, MEDIAN, VARIANCE, statistics for quantitative MINIMUM, MAXIMUM, STANDARD DEVIATION, variables. SKEWNESS & KURTOSIS (seperti rajah).
66
Click Format for the order in Di sini boleh select untuk suppress table yang which results are displayed. lebih dari 10 kategori.
LATIHAN 6
Latihan 6: Sila dapatkan frekuensi variabel berikut
1. Hemoglobin 3rd Trimester 2. Weght Gain at 3rd Trimester Perkara yang sama boleh juga didapati cari arahan STATISTICS > SUMMARISE > DESCRIPTIVES.
67
68
a)
Contoh
Untuk mengimpot fail adalah mudah. Contoh yang ingin ditunjukkan adalah dari format dBaseIV iaitu sga.dbf 1. Klik menu FILE>OPEN. Pada requester yang tertera, tukarkan ke 3.5" Floppy (A:) dalam petak LOOK IN. Pada petak FILES OF TYPE, pilih jenis dBase (*.dbf). Akan kelihatan nama fail sga.dbf pada senarai fail. Pilih fail sga.dbf dan klik OPEN. (Ini jika fail sga.dbf telah disimpan dalam disket a:)
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS
69
2. Data akan masuk terus kepada DATA EDITOR dan pernyataan pemprosesan akan disebutkan dalam tetingkap DATA OUTPUT. Yang ganjil bagi dBase hanyalah akan ada satu Variabel d_r pada kolum pertama yang perlu dipadamkan. SELECT pada lajur (column) d_r, lepas itu klik pada menu EDIT>CLEAR.
3. Selepas ini bolehlah diubahsuai variabel yang telah diimpot dengan menggunakan arahan DEFINE VARIABLE.
Latihan 8: Import file format Excel (yang terdapat dalam CD - nama file: Drug T Student) ke SPSS
Kejayaan anda mengimport data tersebut adalah kejayaan fasilitator jua....yang berusaha untuk menjadikan anda seorang yang pandai SPSS.
LATIHAN 8
70
COPY & PASTE 1. Ada 2 benda yang kerap di"copy & paste" dari SPSS iaitu jadual dan graf. Yang paling mudah adalah graf, jadi kita akan mulakan dengannya terlebih dahulu. 2. Pastikan perisian word processor (eg Word ) dan SPSS kedua-dua telah dibuka terlebih dahulu. Select graf yang ingin disalin dari tetingkap DATA OUTPUT dengan left-click di atasnya sekali. Akan kelihatan petunjuk merah dikirinya.
3. Selepas itu klik pada menu EDIT>COPY OBJECTS (atau CTRL+ALT+C). Klik pada TASKBAR untuk pergi ke WORD . Klik pada EDIT>PASTE (CTRL+V). Boleh juga pilih PASTE SPECIAL, pastikan jenis FORMATTED RTF/DOC yang dipilih. 4. Yang ditampal itu mempunyai sama sifat seperti imej yang lain. Jika ingin merubah apa-apa yang tidak kena, harus dilakukan dalam SPSS terlebih dahulu, sebelum ditampal. 5. Bagi menyalin jadual pula, pastikan perisian Excel turut dibuka. Select jadual yang ingin disalin dari tetingkap DATA OUTPUT dengan left-click di atasnya sekali. Akan kelihatan petunjuk merah dikirinya.
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS 71
6. Selepas itu klik pada menu EDIT>COPY OBJECTS (atau CTRL+ALT+C). Klik pada TASKBAR untuk pergi ke WORD . Klik pada EDIT>PASTE (CTRL+V). Ia akan kelihatan seakan-akan sama seperti dalam DATA OUTPUT. Malangnya tidak boleh diklik langsung, jika diklik, jadual itu akan jadi haru-biru. Jika ingin merubah apa-apa yang tidak kena, harus dilakukan dalam SPSS terlebih dahulu, sebelum ditampal.
72
7. Cara yang lebih baik adalah dengan menggunakan EXCEL . Seperti sebelum ini, select jadual tersebut terlebih dahulu. Tetapi semasa copy, gunakan arahan EDIT>COPY (atau CTRL+C). Gunakan TASKBAR untuk ke EXCEL dan EDIT>PASTE (CTRL+V).
8. Ubahsuai dengan menggunakan arahan EXCEL yang biasa. Select semula jadual ini di dalam EXCEL , COPY dan barulah di PASTE di dalam WORD .
Jadual 1: Frekuensi anemia di kalangan ibu mengandung di HUSM Frekuensi Normal Anemia Tidak tahu Jumlah 196 18 4 218 Peratus 89.90826 8.256881 1.834862 100
LATIHAN 9
Latihan 9: Cuba lakukan sendiri COPY & PASTE dengan menggunakan rajah dan jadual yang lain.
73
74
4. Pilih If Condition is statisfied. 5. Pilih variabel smoking dalam panel sebelah kiri dan tekan anak panah kecil antaranya untuk dimasukkan kedalam dialog box. 6. Anda mesti bijak dalam mengatur formula. Apa yang anda kehendaki sekarang adalah anda akan menganalisa pesakit yang terlibat dalam pasif smoking sahaja. Oleh yang demikian anda terpaksa menyisihkan data yang pesakit yang tidak terlibat dalam pasif smoking. 7. untuk itu variable smoking ~= 0 adalah formula yang sesuai untuk itu.
8. Klik continue>ok 9. Kita akan melihat tanda / bagi kies-kes yang tidak akan dimasukkan dalam analisa.
75
10. Perlu diingat bahawa anda dikehendaki deselect semula untuk menjalankan analisa lain. Klik Data->Select Case->Click button Reset.
76
BAB V.
Analisa Parametrik
Z-Test (Ujian Z)
Data jenis kualitatif akan menggunakan perkadaran. Untuk membandingkan nilai perkadaran antara kumpulan kajian, ujian statistik seperti Z-Test dan ujian khi kuasa dua boleh digunakan.
a)
Z-Test
Digunakan bagi membandingkan 2 perkadaran. Formula; z= p1 - p2 p0q0 [1/n1 + 1/n2] di mana p1 adalah perkadaran kejadian 1 = a1/n1 p2 adalah perkadaran kejadian 2 = a2/n2 a1 dan a2 ialah kejadian 1 dan 2 p0 = p1n1 + p2n2 n1 + n2 q0 = 1 p0 Digunakan jadual taburan normal untuk menolak atau tidak menolak hipotesis nol.
Contoh kiraan
Perbandingan kadar infestasi cacing antara lelaki dengan perempuan di sebuah sekolah. Kadar lelaki = 29/96 = 0.302 Kadar perempuan =24/104 = 0.231
77
(0.735*0.265) [1/96 + 1/104] Dari jadual taburan normal, nilai z yang bermakna pada batas kemaknaan 0.05 ialah 1.96. Maka nilai z kiraan lebih kecil dari 1.96, maka tidak wujud perbezaan yang bermakna antara 2 perkadaran tersebut.
78
Jadual jangkaan
+ + eg/n eh/n e
fg/n fh/n f g h n
Nilai khi kuasa dua dikira dengan menjumlahkan (cerapan jangkaan)2/jangkaan bagi sel jadual. X2 = Jumlah (O-E)2 E darjah kebebasan dk = (jumlah baris 1) (jumlah lajur 1) Contoh kiraan Jadual observasi + + Jadual jangkaan + + 96*53/200 96*147/200 e 104*53/200 104*147/200 f g h n 29 67 24 80
79
Maka nilai X2 = (29 25.44)2 + (24 27.56)2 + (67-70.56)2 + (80 76.44)2 25.44 X2 = 1.303 Dilihat pada jadual bagi X2 pada df=1 dan batas kemaknaan 0.05, nilainya ialah 3.84. Oleh kerana nilai X2 kiraan lebih kecil dari nilai X2 jadual, maka tidak wujud perbezaan perkadaran yang bermakna. Oleh itu hipotesis null tidak ditolak. 27.56 70.56 76.44
80
3. Pada requester yang timbul, isikan variabel yang ingin dilakukan ujian tersebut. Biasanya faktor risiko (SMOKING) diletakkan pada baris (row) dan penyakit (CASE) diletakkan di lajur (column). Boleh masukkan lebih dari satu variabel kuantitatif yang ingin diuji. Klik butang Statistics dan pilih chi-square. Tekan "continue" dan kemudian tekan butang CELLS. Pilih ROW PERCENT. Tekan CONTINUE dan tekan butang "OK".
81
4. Selepas ini ujian chi-square akan dilakukan oleh SPSS dan tingkap "Output" akan timbul menunjukkan hasil analisa. Yang akan kelihatan adalah seperti dibawah;
5. Ini menunjukkan bahawa dikalangan perokok pasif, peratus SGA lebih tinggi iaitu 57.1% berbanding dengan yang tidak iaitu 42.9%. Dari jadual seterusnya, nilai chi square ialah 10.328 dan nilai p ialah 0.001. Maka terbukti ada hubungan bermakna antara perokok pasif dan kejadian SGA. 6. Maka jadual yang dilukis bagi laporan tesis adalah seperti di bawah;
82
Jadual 1: Jadual kontigensi menunjukkan hubungan antara risiko rokok dengan kejadian SGA. Kumpulan Tidak merokok Perokok pasif Jumlah X2 = 10.328, p = 0.001 7. Sekiranya dalam jadual 2x2, ada nilai sel jangkaan yang kurang dari 5, maka nilai p dan nilai X2 yang dibaca ialah nilai p di baris CONTINUITY CORRECTION. Ianya adalah serupa seperti kiraan Yates Correction. Tetapi sekiranya saiz sampel lebih kecil iaitu kurang dari 40, maka nilai yang dibaca ialah nilai p dan nilai X2 pada baris Fishers' Exact Test.
Latihan 10: Sila buat interpretasi jadual berikut
Normal 41 67 108
SGA 20 89 109
LATIHAN 10
Ujian T Independent
Untuk membandingkan min 2 kumpulan yang tidak bersandar (independent). Contohnya min Hb di antara kes dan kawalan. 2 variabel akan terlibat iaitu satu variabel kuantitatif dan satu lagi variabel kualitatif dengan hanya 2 kemungkinan (e.g. jantina lelaki dan perempuan).
83
Formula umum; t=
t=
1. Mula-mula buka data tersebut 2. Kemudian klik pada menu Analyze ->Compare Means ->Independent Samples T Test (seperti gambarajah di bawah)
84
4. Pada requester yang timbul, isikan variabel yang ingin dilakukan ujian tersebut. Pada petak "Test Variable(s):", masukkan variabel kuantitatif (chhamd6) yang ingin diuji. Boleh masukkan lebih dari satu variabel kuantitatif yang ingin diuji.
5. Pada petak "Grouping Variable:", masukkan variabel kualitatif (drug), kemudian klik pada butang "Define Groups" dan masukkan kumpulan yang ingin dibandingkan (S & F). Klik butang "continue" dan kemudian butang "okay".
6. Selepas ini ujian t independent akan dilakukan oleh SPSS dan tingkap "Output" akan timbul menunjukkan hasil analisa. Yang akan kelihatan adalah seperti
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS 85
dibawah;
7. Ini menunjukkan jumlah sampel (N), min dan sisihan piawai bagi chhamd6 bagi kumpulan S dan F.
8. Mula-mula sekali lihat nilai p (Sig.) pada Levene's Test. Jika p>0.05, maka gunakan baris "equal variances assumed". Jika p<0.05, gunakan baris "equal variances not assumed". Bagi kes di atas, p=0.850, maka kita akan gunakan baris "equal variances assumed". Dapat dilihat bahawa nilai p = 0.755, iaitu p>0.05, maka tidak wujud perbezaan dari segi perubahan skor HAMD di antara 2 ubat tersebut selepas 6 minggu rawatan. 9. Maka jadual yang dilukis bagi laporan tesis adalah seperti di bawah; Jadual 1: Min perubahan skor HAMD selepas 6 minggu rawatan mengikut kumpulan rawatan. Kumpulan S F N 32 34 Min 8.38+1.60 8.50+1.64 Ujian Ujian T t = 0.313 p 0.755
86
Ujian T Berpasangan
Digunakan apabila perbandingan variabel kuantitatif dilakukan pada individu yang sama. Contohnya apabila individu itu merupakan kedua-dua kawalan dan kes pada kajian yang sama, iaitu sebelum dan sesudah intervensi. Boleh juga digunakan bagi kes dan kawalan yang telah dipasangkan mengikut kriteria seperti umur, jantina dan etnik (matched pairs). Maka ia akan melibatkan 2 variabel kuantitatif yang berpasangan pada satu kajian. Formula yang digunakan ialah; Mula-mula dikira beza di antara nilai pertama dan nilai kedua bagi setiap individu dalam kajian = D. Kemudian dikira nilai min D dan sisihan piawainya. Dari 2 nilai tersebut, t dikira mengikut formula di bawah;
t= di mana = dan
87
3. Pada requester yang timbul, isikan variabel yang ingin dilakukan ujian tersebut. Pada petak "Test Variable(s):", masukkan pasangan variabel kuantitatif (hb2 & hb3) yang ingin diuji. Boleh masukkan lebih dari satu pasangan variabel kuantitatif yang ingin diuji. Klik butang "okay".
5. Selepas ini ujian t berpasangan akan dilakukan oleh SPSS dan tingkap "Output" akan timbul menunjukkan hasil analisa. Yang akan kelihatan adalah seperti dibawah;
6. Ini menunjukkan jumlah pasangan (N), min dan sisihan piawai bagi pasangan
Buku Panduan Kajian Saintifik, Statistik dan Pengenalan SPSS 88
7. Ini menunjukkan korelasi antara pasangan di atas. Tiada korelasi yang bemakna (p>0.05).
7. Dapat dilihat bahawa nilai p = 0.004, iaitu p<0.05, maka wujud perbezaan yang bermakna dari rawatan selepas 6 minggu rawatan bagi ubat hematinik. Daripada nilai min, dapat dilihat bahawa min selepas rawatan adalah lebih besar dari min sebelum rawatan. Ini bermakna pesakit semakin sembuh dengan rawatan ubat hematinik. 8. Maka jadual yang dilukis bagi laporan tesis adalah seperti di bawah; Jadual 1: Min Hemoglobin sebelum dan selepas 6 minggu rawatan hematinic.
Kumpulan Sebelum rawatan Selepas rawatan N 70 70 Min 10.24+0.36 10.59+0.97 Ujian Ujian T berpasangan t = -3.08 p 0.004
Latihan 11: Analisa file tbkkp.sav bagi pesakit tuberkulosis dengan menggunakan ilmu yang anda telah pelajari.
Cari p value bagi pesakit yang menjalankan rawatan ubat TB yang dikategorikan dalam sembuh dan seumpamanya
LATIHAN 11
89
Source of variation
Degrees of Freedom
Between a c a/c Groups ad/bc Within b d b/d Groups Daripada nilai F yang dikira, dirujuk kepada jadual F dan dipastikan sama ada nilai p kiraan melebihi atau kurang dari 0.05.
90
3. Pada requester yang timbul, isikan variabel yang ingin dilakukan ujian tersebut. Pada petak "Test Variable(s):", masukkan variabel kuantitatif (c5diasto) yang ingin diuji (lihat rajah di bawah). Boleh masukkan lebih dari satu variabel kuantitatif yang ingin diuji. Pada petak "Grouping Variable:", masukkan variabel kualitatif (obesity), kemudian klik pada butang Post Hoc.
4. Pada requester post-hoc, klik pada LSD (lihat rajah), tekan butang "continue".Kemudian tekan "Options" dan klik pada "Descriptives", tekan butang "continue" dan kemudian butang "okay".
91
5. Selepas ini analisa ANOVA akan dilakukan oleh SPSS dan tingkap "Output" akan timbul menunjukkan hasil analisa. Yang akan kelihatan adalah seperti dibawah;
Daripada nilai min dalam jadual deskriptif, dapat dilihat bahawa min diastolik semakin meningkat mengikut tahap obesiti.
6. Ini menunjukkan nilai F = 15.106 dan nilai p = 0.000. Maka wujud perbezaan yang bermakna di antara tahap obesiti dengan tekanan darah diastolik. Persoalannya adalah pada kumpulan mana satukah yang wujud perbezaan
92
7. Dapat dilihat bahawa nilai p<0.05 bagi kesemua perbandingan, maka wujud perbezaan yang bermakna dari segi tahap obesiti dengan tekanan diastolik di antara semua kumpulan. 8. Maka jadual yang dilukis bagi laporan tesis adalah seperti di bawah; Jadual 1: Min tekanan diastolik dan status obesiti. Kumpulan Kurang berat Normal Lebih Berat Min 78.20+10.46 82.23+14.05 87.97+11.75 ANOVA F = 15.106 0.0000 Ujian p
Latihan 12: Cari p value bagi satu kajian uang berkaitan dengan obesiti di daerah setiu
Buka file program-unggul.sav dalam folder latihan Dependent list: tchol; manakala factor: fat category. Bincangkan penemuan anda dengan ahli group.
LATIHAN 12
93
Korelasi (Correlation)
Untuk menentukan adanya hubungan antara 2 variabel kuantitatif yang bersandar (dependent). Pekali korelasi (r) mempunyai nilai minimum -1 dan nilai maksimum +1. -1 bermaksud korelasi sempurna negatif +1 bermaksud korelasi sempurna postif dan 0 bermaksud tiada korelasi langsung
Formula umum;
r= di mana =
94
3. Pada requester yang timbul, isikan variabel yang ingin dilakukan ujian tersebut. Pada petak "Variables:", masukkan variabel kuantitatif yang ingin diuji iaitu (c5diasto) & (bmi) (lihat rajah di bawah). Boleh masukkan lebih dari dua variabel kuantitatif yang ingin diuji. Kemudian tekan butang "okay".
4. Selepas ini analisa korelasi akan dilakukan oleh SPSS dan tingkap "Output" akan timbul menunjukkan hasil analisa. Yang akan kelihatan adalah seperti dibawah;
95
6. Ini menunjukkan nilai r = 0.341 dan nilai p = 0.000. Maka wujud hubungan yang bermakna di antara BMI dengan tekanan darah diastolik. 8. Maka jadual yang dilukis bagi laporan tesis adalah seperti di bawah; Jadual 1: Korelasi tekanan diastolik dan BMI Variabel Tekanan Diastolik BMI r 0.341 p 0.0000
9. Nota: Dalam kajian statistik nilai r boleh ditafsirkan seperti berikut: 0.00 0.3 = korelari lemah 0.3 0.6 = Korelasi sederhana 0.6 1.00 = Korelasi Kuat
Latihan 13: Cari corelasi antara berat badan ibu semasa mengandung dengan berat bayi semasa lahir.
LATIHAN 13
96
Regresi (Regression)
Digunakan untuk mengukur hubungan fungsi antara 2 variabel kuantitatif, di mana satu variabel bersandar (dependent) dan satu lagi variabel tidak bersandar (independent). Formula yang digunakan ialah; y = a + bx di mana
b=
3. Pada requester yang timbul, isikan variabel yang ingin dilakukan ujian tersebut. Pada petak "Dependent:", masukkan variabel kuantitatif bersandar yang ingin diuji iaitu (c5diasto). Pada petak "Independent" masukkan variabel kuantitatif tidak
97
bersandar iaitu (bmi) (lihat rajah di bawah). Boleh masukkan lebih dari dua variabel kuantitatif tidak bersandar yang ingin diuji. Kemudian tekan butang "okay". Bagi "Method" terdapat pelbagai kaedah, sebagai contoh kita akan gunakan kaedah "Enter".
4. Selepas ini analisa regresi akan dilakukan oleh SPSS dan tingkap "Output" akan timbul menunjukkan hasil analisa. Yang akan kelihatan adalah seperti dibawah;
98
6. Yang dilihat adalah jadual yang terakhir di mana a = 64.145 dengan nilai p=0.000. Nilai b = 0.811 dengan nilai p = 0.000. Maka formula y=a+bx dapat ditulis sebagai; tekanan diastolik = 64.145 + (0.811).(BMI)
Latihan 14: Cari logistik regresion dalam masalah ibu merokok dan bayi SGA
LATIHAN 14
Nota ini berakhir di sini, mungkin cara pesembahan agak bercelaru, jika kecelaruan itu benar-benar wujud dalam benak anda, bermakna anda telah memahami hakikat statistik
99
Bahan Rujukan
1. Azmi Tamil, M Rizal Manaf. Bengkel Asas SPSS: Sesi Asas. October 2002. 2. http://bmj.bmjjournals.com/epidem/epid.1.html 3. John Gay. Clinical Study Design and Methods Terminology. August 22, 1999. (Online http://www.vetmed.wsu.edu/courses-jmgay/GlossClinStudy.htm) 4. Azmi Tamil. Pakej Belajar Sendiri SPSS. (Online http://161.142.92.99/hululangat/stat/spss.htm) 5. The Shodor Faoundation Education Program. Introduction to Statistics: Mean, Median, and Mode. (Online:http://www.shodor.org/interactivate/lessons/sm1.html) 6. Rahman JA. Biostatistics, 2009.
100
Konfounders, 2 Korelasi, 46, 91, 92, 93 Kualitatif, 1 Kuantitatif, 1 mammography, 16 Mean, 2, 60, 88, 97 Median, 2, 97 medline, 25, 26 menu, 56, 57, 70, 71, 72, 78, 82, 85, 88, 92, 94 Method, 95 mode, 2, 65 Mortaliti, 4 Null Hipotesis, 2 obesiti, 88, 91 observational, 7, 16, 18, 19, 20, 41 Output, 79, 83, 86, 90, 93, 95 overweight, 88 Ovid, 28 P, 2 Perinatal mortality rate, 5 Permaisuri, v PICO, 33, 34 Poliklinik, v Polinomial, 1 Populasi, 1 Prevalen, 4, 5 prevalence, 4, 10, 11, 12, 15, 16, 19, 38, 39, 43, 44 Prof Khoo, 33 Proquest, 28 quantitative, 32, 66 questionnaires, 12, 31 Ralat, 2, 46, 47 Randomized Cross-Over, 17 RCT, 17 Regresi, 46, 94 Relative risk, 6, 7 retrospective, 19 risk, 6, 7, 10, 11, 13, 18, 19, 31, 43 Rizal Manaf, 97 Sampel, 1 Sample, 3, 38, 69
a
sampling, 40, 41, 42 Sensitivity, 15 Specificity, 15 SPSS, i, v, vi, vii, 46, 48, 49, 50, 53, 55, 60, 65, 68, 69, 70, 71, 72, 77, 79, 82, 83, 85, 86, 88, 90, 92, 93, 94, 95, 97 statistik, v, 3, 6, 16, 45, 48, 74, 96 Statistik, 1, 3 Stillbirth rate, 5 Suhazeli, v T Berpasangan, 84, 85 T Independent, 81, 82 tekanan, 37, 88, 91, 92, 93, 94, 96
True positives, 15 underweight, 88 validity, 14, 40 variabel, 46, 49, 50, 51, 52, 53, 54, 56, 57, 59, 61, 65, 66, 69, 70, 77, 81, 82, 84, 86, 88, 89, 91, 92, 94, 95 Variabel, 1 variables, 35, 36, 37, 38, 61, 65, 66, 68 WHO, 48 xray, 16 Z-Test, 74
Ujian X dengan pembetulan Yates Ujian t Student ANAVA Ujian T berpasangan (paired T test) Korelasi Pearson & regresi linear
Kuantitatif selanjar
McNemar chi-square test Ujian hasil tambah pangkat Wilcoxon atau Ujian U Mann-Whitney Ujian ANAVA satu hala Kruskal-Wallis Ujian pangkat bertanda Wilcoxon Korelasi pangkat Spearman/Kendall
Data tidak bertabur normal Ukuran berulang pada individu yang sama dan perkara yang sama Data tidak bertabur normal
Perbincangan kumpulan
Pengumpulan data
Laporan kajian
NOTA tambahan