Data Process Improvement: Data Collection Is A Term Used To Describe A Process of Preparing and
Data Process Improvement: Data Collection Is A Term Used To Describe A Process of Preparing and
Data collection is a term used to describe a process of preparing and collecting data - for example as part of a process improvement or similar project. The purpose of data collection is to obtain information to keep on record, to make decisions about important issues, to pass information on to others. Primarily, data is collected to provide information regarding a specific topic.[1] Data collection usually takes place early on in an improvement project, and is often formalised through a data collection plan[2] which often contains the following activity. 1. Pre collection activity Agree goals, target data, definitions, methods 2. Collection data collection 3. Present Findings usually involves some form of sorting[3] analysis and/or presentation. Prior to any data collection, pre-collection activity is one of the most crucial steps in the process. It is often discovered too late that the value of their interview information is discounted as a consequence of poor sampling of both questions and informants and poor elicitation techniques.[4] After pre-collection activity is fully completed, data collection in the field, whether by interviewing or other methods, can be carried out in a structured, systematic and scientific way. A formal data collection process is necessary as it ensures that data gathered is both defined and accurate and that subsequent decisions based on arguments embodied in the findings are valid.[5] The process provides both a baseline from which to measure from and in certain cases a target on what to improve. Types of data collection
Other main types of collection include census, sample survey, and administrative by-product and each with their respective advantages and disadvantages. A census refers to data collection about everyone or everything in a group or population and has advantages, such as accuracy and detail and disadvantages,
such as cost and time. A sample survey is a data collection method that includes only part of the total population and has advantages, such as cost and time and disadvantages, such as accuracy and detail. Administrative by-product data is collected as a byproduct of an organizations day-to-day operations and has advantages, such as accuracy, time simplicity and disadvantages, such as no flexibility and lack of control.[6]
[edit]
0. Types of Data: 0.1 Primary Data: The data which are originally collected by an agency for the first time for any statistical investigation are said to be primary data. 0.2 Secondary Data: The data which have already been collected by some agency and taken over from there and used by any other agency for their statistical work are termed as secondary data. So in simple if a primary data collected for a statistical investigation are used in other statistical investigation then those data are called as secondary data.
first step in a statistical investigation. Utmost care must be exercised in collecting data as because they form the foundation of statistical method. If data are faulty, the conclusion drawn can never be reliable. 2.1 Primary Data Collection Techniques a) Observing Behaviors of Participants: This method specifies the conditions and methods at making observation. In this method the information is sought by way of investigators own direct observation without asking from the respondent. The main advantage of this method is that subjective bias is eliminated, if observations are done accurately. It is the most commonly used method especially in studies relating to behavioral science.
b) Questionnaire Method: Under this method a list of questions pertaining to the survey (known as questionnaire c) is prepared and sent to the various informants by post. The questionnaire contains questions and provides space for answer. A request is made to the informants through a covering latter to fill up the questionnaire and sent it back within a specified time. The respondents have to answer the questions on their own. The questionnaire can be delivered directly hand by hand, through surface post or as an electronic questionnaire. c) Interview Method: This involves listening to or integrating informants. The interview method of collecting data involves presentation of oral-verbal stimuli and reply in terms of oral verbal responses. So, under this method of collecting data, there is a face to face contact with the persons from whom the information is to be obtraimned know as informants. The interviewer asks them question pertaining to the survey and collects the desired information. This method can be used through personal interview, telephone interview, Chat, Audio Conferencing, Video Conferencing, etc. i) Structured ii) Semi Structured iii) Open Interviews d) Schedules Method: In this method of data collection the enumerator or interviewers who are specially appointed for the purpose along with schedules, go to the respondents, put to them the questions from the Performa in the order the questionnaire are listed and record the replies in the space meant for the same in the Performa. In certain situation, schedules may be handed over to respondents and enumerators may help them in recording their answer to various questions in the said schedules. Enumerator explains the aims and objectives of the investigation and also removes the difficulties which respondents may feel in relation to understanding the implication of a particular question or a definition or concept of difficult term. This method has the advantage over the questionnaire method in the sense that the respondents have no scope to misunderstand any question and thereby putting irrelevant answer. e) Information from Correspondents: Under this method, the investigator appoints local agent or correspondents in different places to collect information. These correspondents collect and transmitted information to the central office where the data are processed. The special advantage of this method is that it is cheap and appropriate for extensive investigation. However, it may not always ensure accurate results because of the personal prejudice and bias of the correspondents. Newspaper agencies generally adopt this method. Besides the above methods now a day many big companies also follow some other method for primary data collection purposes it includes warranty card, Distributor or Store Audit, Consumer Panels, Projective Techniques, Depth Interview, Content analysis, etc which are carried out through the audio visual Aids or through the aid of other electronic devices. 2.2 Secondary Data Collection Techniques: In most of the studies the investigator finds it impracticable to collect first hand information on all related issues and as such he/she makes use of the data collected by others. The secondary data can be collected by the following procedures a) Published Sources: By way of examining historical and other records, literature and proverbs. b) Unpublished Sources: If data available in secondary sources are reliable, suitable and adequate then one can use secondary data for his/her study. 3. Tools of Data Analysis 3.1 Statistical Software Packages
4. Techniques for Data Analysis: After the data have been collected and organized they are ready for presentation. Data presented in an orderly manner facilitate statistical analysis also. Diagram attract the human mind more compared to numerical figures, which causes for pause for a while to have a glance at the diagram and thus can get an overall ideas of the said data. In practice a very large variety of diagrams are in use and new ones are constantly being added. In the following only more frequently used diagram are discussed.
4.1 Statistical Methods: The notion of statistics were originally derived from the word state and the word statistics is derived from the Latin word status, Italian state and German word statistik meaning a political state. The word statistics was first used by Professor Cottried Achenwall in 1949 to refer to a subject matter as a whole. He defined statistics as The political science of several countries. The word statistics convey a variety of meaning to people, when it is used in different context. Some of them are a) Statistics is an imposing form of mathematics b) Statistics suggest tables, charts and figures and as such they are called numerical statement of facts. c) Statistics refers to information about an activity or a process whether it is production, population, national income, etc expressed in numbers. d) Statistics refer to a subject, just like any subject, which is a body of methods of obtaining and analysis data in order to make decisions on them. Statistics is usually not studied for its own sake; rather it is widely employed as a highly valuable tool in analysis of problem in nature, physical and social sciences. Horace Secrist define statistics as by statistics we mean aggregates of facts, affected to a mankind extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other. According to Freund and William as mentioned in Modern Business Statistics, Statics in general are nothing but a refinement of everyday thinking. They are especially appropriate for handling data which are subject to variation that cannot be fully controlled by experimental method and for which we can have only a fraction of the totality of observation which may exist. The definition of the term statistics includes the following, and in its absence of these characteristic any numerical data cannot be called statistics. a) Statistics are aggregate of facts; b) Statistics are affected to a marked extend by multiplicity of causes. c) Statistics are numerically expressed. d) Statistics are enumerated or estimated according to reasonable standards of accuracy. e) Statistics are collected in a systematic manner. f) Statistics are collected for a predetermined purpose. g) Statistics should be placed in relation to each other. Statistical method is the systematic method which is used to organized, present, analyze and interpret the large or small volume of numerical information effectively. It is nothing but the method by which statistical data are analyzed. According to Croxlon and Cowden statistics may be defined as the collection, presentation, analysis and interpretation of numerical data, however what they missed in their definition is the organization of data. Thus statistical method is the science of collection, organization, presentation, analysis and interpretation of numerical data and for a statistical method above five stages are essential and comprehensive. Statistical method ranges from the most elementary descriptive devices which may be understand by the common man to those complicated mathematical procedure which can be apprehended only by the expert theoreticians. 4.1.1 One Dimensional a) Bar Diagram i) Simple Bar Diagram: To draw a simple bar diagram, equidistant bars each of equal width are drawn on a line, one for each group of the data. The value of each group is represented by the height of the corresponding bar generally, in case of time based data, vertical bars are drawn and to represent space based (or other) data horizontal bars are drawn. A simple bar diagram is used to represent only one variables. ii) Sub Divided Bar Diagram: The sub divided bar diagram is used if the total magnitude of the given variable is to be divided into various parts or components. The method of drawing this type of diagram is same as that of the bar diagram only the bar drawn should be divided into various segments, according to the given components of the total. iii) Multiple Bar Diagram: To represent two or more numerical characteristic by the same diagram, multiple bar diagram is to be used. A multiple bar diagram is obtained by drawing a number of equidistant vertical set of bars on a
line. Each set of bars contains two or more adjacent bars. Width of the bars is same and height of the corresponding bars is to be taken in the ratio of the numerical figure which is denoted by that bar. The total numbers of set of bars are taken to be equal to the total number of items. iv) Percentage Bar: Percentage bars are particularly useful in statistical work which requires the portrayal of relative changes in data. When such diagrams are prepared the length of the bars is kept equal to 100 and segments are cut in these bars to represents the components (percentage) of an aggregate. v) Deviation Bar: Deviation bars are popularly used for representing net qualities excess or deficit, i.e. net profit, net loss, net export or imports etc. such bars can have both positive and negative values. Positive values are shown above the bars line and negative values below it. 4.1.2 Two Dimensional: In two dimensional diagrams the length as well as the width of the bars is considered. Thus the area of the bars represents the given data. Two dimensional diagrams are also known as surface diagram or area diagram. The important types under this category are a) Rectangles: In constructing rectangle one may represent the figures as they are given or may convert them to percentage and then subdivided the length into various components. The area of a rectangle is equal to the product of its length and width, so in constructing a reactangle both length and width are important. b) Squares: The rectangular method of diagrammatic presentation does not looks good when the values of item vary widely. So, in order to overcome this difficulty squares method are used. In this method one has to take the square root of the values of various items that are to be shown in the diagrams and then select a suitable scale to draw the square. c) Circles: In Circles both the total and the component parts or sector can be shown. Since the area of a circle is preoperational to the square of its radius, so in the construction of circles, the square root of various figures are worked out, and the radii of the circles drawn are proportional to the square root of the figures. d) Pie Diagram: For constructing a pie diagram the various components values of data are transposed into corresponding degrees on the circle, and then the diagram obtained by dividing a circle into various sector is known as circle or pie diagram. The number of sector should be equal to the total number of components parts. The area of the sectors should be taken in the ratio of the values of the constituent parts. 4.1.3 Three Dimensional: Three dimensional diagrams are also known as volume diagrams. In such diagram three things namely length, width and height have to be taken into account. Such diagrams are used where the range of difference between the smallest and the largest values is very large. a) Cubes: Amongst three dimensional diagram cubes are most popular and also simples to draw. The side of a cube is drawn in proportional to the cube root of the magnitude of data. b) Cylinder: c) Spheres: 4.1.4 Others a) Average: The most important objectives of statistical analysis is to get one single value that describes the characteristic of the entire mass of un-widely data, such a value is called the central value or an average. Croxton and Cowden define average as An average value is a single value within the range of the data that is used to represent all of the values in the series. Since an average is same where within the range of the data, it is something called a measure of central value. Objectives of Averaging: There are two main objectives of the study of averages i) To describe the characteristic of the entire group i.e. gets birds eye view of the entire data; ii) To facilitate comparison. Characteristic of Average: Average have the following properties or requisites. i) Easy to Understand; ii) Simple to Compute; iii) It Include All the Items;
Types of Average i) Arithmetic Mean: The value of arithmetic mean is obtained by adding together the entire items and by dividing the total by the number of item. Arithmetic means is represented by the symbol (Read as x bar). The arithmetic mean of a set of data is the sum of the values of the data divided by the number of observation. Calculating Arithmetic Mean: We can divide the entire observation which exists in the universe into the following groups for easy calculation of arithmetic mean Arithmetic Mean of Individual Observation: Individual observation means the observation where frequencies are not given. For this type of observation two types of method are followed Direct Method: In this method the various values of the variables are add together and divide the total by the number of item. Short Cut Method: This method considers of calculating the arithmetic mean by using an arbitrary origin (A), and the deviations are taken from this arbitrary origin. Arithmetic Mean of Discrete Series: Can be calculated by Direct Method, Short Cut Method. Arithmetic Mean of Continuous Series: Can be calculated by Direct Method, Short Cut Method and Step Deviation / Change of Origin and Scale Method in which we take a common factor from the data and multiply the result by the common factor. Weighted Arithmetic Mean: When the relative importance fo the different items are very then this method is used. Merits of Arithmetic Mean: Arithmetic mean is most widely used in practice because of the following reasons. a) It is the simplest average to understand and easiest to compute in comparison to median and mode. b) It is affected by the values of every item in the series. c) It is defined by a rigid mathematical formula. So every one who computers the average gets the same result. d) The mean is typical in the sense that it is the centre of gravity, balancing the values on either side of it. e) Average is calculated value and not based on position in the series. f) Being determined by a rigid formula, it lends itself to subsequent algebraic treatment better than the median or mode. g) It is relatively reliable in the sense that it does not vary too much when repeated samples are taken from one and the same population. Limitations of Arithmetic Mean: a) It is Unduly Affected by Extreme Items: Since the values of mean depend upon each and every item of the series, extreme items i.e vary small and very large item unduly affect the value of the average. b) Distribution with Open End Classes: In a distribution with open End Classes the value of mean cannot be computed without making assumption regarding the size of the class interval of the open end classes. If such classes contain a large proportion of the values, then mean may be subject to substantial error. c) Not Always Good Measure of Central Tendency: The mean provides a characteristic value in the sense of indicating where most of the values lie, only when the distribution of the variables is reasonably normal (bell-shaped). In case of U shaped distribution the mean is not likely to serve a useful purpose. ii) Median iii) Mode iv) Geometric Mean v) Harmonic Mean Besides the above types there are also some other types of averages i.e progressive average, moving average, etc.
An average is a single value representing a group of values, so it must be properly interpreted, otherwise ther is every possibility of jumping to wrong conclusionAny average if unduly affected by the extreme items then it losses its value. Further it should be rigidly defined, capable of further algebric treatment, and should have a sampling stability. 4.1.5 Geographical Information System (GIS): 4.1.6 Distributions, Variances, Correlations: 4.2 Transforming Qualitative Information into Quantitative Data: a) Pictographs: Pictures are attractive and easy to comprehend and as such this method is particularly useful in presenting statistics to the layman. It pictograph the data are represented through a pictorial symbol, which is very carefully selected, so pictographs depict the kind of data we dealing with. b) Cartogram: Cartograms or statistical maps are used to give quantitative information on a geographical basis. They are thus to represent spatial distributions. The quantities on the map can be shown in many ways, such as through shades or colors, by dots, by placing pictograms in each geographical unit and by placing the appropriate numerical figures in each geographical unit. c) Graphs: i) Graph of Time Series: When we observe the values of a variable at different points of time, the series so formed is known as time series. Line Diagrams: Time based data can be represented by line diagram. In this case, points are plotted on the graph paper by taking time as X co-ordinate and the data corresponding to that particular time as Y co-ordinate. After that by joining the points in pairs by line segment, line diagrams are drawn. Graphs of Frequency Distribution: A frequency distribution can be presented graphically in any of the following ways d) Histogram: Histogram consists of a series of adjacent vertical rectangles, drawn and each of each class intervals. Area of each rectangle determines the frequency of that class. Generally for the graphical representation of frequency distribution of continuous variable histogram is used. To draw histogram, firstly class intervals are marked along horizontal axis (X-axis) and frequencies are to be marked along vertical axis (Y-axis) after that taking, difference between lower and upper boundaries as base rectangles are drawn one for each class recording to the ratio of the area of the frequency. Since the area of the rectangles having same base are proportional to the length therefore, in case of frequency distribution having equal class width, the height of the rectangles should be taken in the ration of the frequencies. e) Frequency Polygon: To draw frequency polygon, points are plotted on the co-ordinate plane by taking the mid value of a class as X co-ordinate and corresponding frequency of the class as Y co-ordinate. The points are then joined in pairs represented by a line segment. The polygon is closed at both ends, by extending it to the mid-points of two classes having frequency zero, before the first class and after the last class. f) Smoothed Frequency Curve: The smoothed frequency curve is drawn freehand in such a manner that the area included under the curve is approximately the same as that of the polygon. The object of drawing a smoothed frequency curve is to eliminate as far as possible accidental variations that might be present in the data. g) Cumulative Frequency Curves or Ogives: Cumulative frequency curve is a smooth curve. To draw this curve, points are plotted on the graph paper by taking upper class boundaries as X co-ordinate and cumulative frequency of the respective class as Y co-ordinate. The points so obtained are joined by a smooth free hand curve. This curve is joined to the lower class boundary of the first class. The smooth curve drawn in this manner is called the cumulative frequency curve. Summery: Previously all statistical analysis was done manually, but now a day advent of computer helps to analysis all data through some general as well as specific software packages. General software package includes uses of MS Excel or like other software, where as special software package includes SPSS. References
Converse, Jean & Presser, Stanley (1986). Survey questions: Handcrafting the standardized questionnaire. Newbury Park, CA: SAGE Publications. Gupta, S. P. (1983). Statistical method, 18th rev. ed. New Delhi: Sultan Chand and Sons, E3.1-E3.22. E6.1E6.58.E7.1-E7.83. Kothari, C. R. (1990). Research methodology: Methods and techniques, 2nd ed. New Delhi: New Age. 95121, 132-134. Mikkelsen, Britha (2005). Methods for development work and research: A new guide for practitioners, 2nd ed. New Delhi: SAGE Publications. P.S.G. Kumar (2004). A students manual of library and information science, 2nd ed.. Delhi: B.R. publishing corporation. 357-368, 371-384, 454-456. PSG Kumar, research methods and , 385-96. Patton, Michael Quinn (1990). Qualitative Evaluation Methods. USA: SAGE Publications. Patton, Michael Quinn (2002). Qualitative research and Evaluation Methods. Third edition. Thousand Oaks: SAGE Publications. Polonsky, Michael Jay & Waller, David S. (2004 & 2005). Designing and managing a research project: A business students guide. New Delhi: Response books. Rubin, H. J. (1983). Applied social research. Columbus, OH: Charles E Merrill. Quoted in Mikkelsen, Britha (2005). Methods for development work and research: A new guide for practitioners, 2nd ed. New Delhi: SAGE Publications. Tripathi, S.M., Lal, C and K Kumar (2002). Descriptive questions in Library and Information Science. New Delhi: Ess Ess. 346-372.
http://www.gfmer.ch/Activites_internationales_Fr/Laos/PDF/Data_c ollection_tecniques_Chaleunvong_Laos_2009.pdf