0% found this document useful (0 votes)
33 views22 pages

BAM125 - Notes

Data Processing I

Uploaded by

Iorlaha Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views22 pages

BAM125 - Notes

Data Processing I

Uploaded by

Iorlaha Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

What is a Computer?

The straightforward meaning of a computer is a machine that can calculate. However,


modern computers are not just a calculating device anymore. They can perform a
variety of tasks. In simple terms, a computer is a programmable electronic machine
used to store, retrieve, and process data.

According to the definition, "A computer is a programmable electronic device


that takes data, perform instructed arithmetic and logical operations, and
gives the output."

Whatever is given to the computer as input is called 'data', while the output received
after processing is called 'information'.

A Brief History of Computer


The term 'Computer' was first introduced in 1640 and referred to as 'one who
calculates'. It was derived from the Latin word 'computare', which meant 'to
calculate'. In 1897, it was known as the 'calculating machine'. Later in 1945, the term
'computer' was introduced as 'programmable digital electronic computer, which is
now called a 'computer'.

When the computers were introduced, they were large and could fill an entire room or
even more. Some computers were operated using large-sized vacuum tubes. In
1833, Charles Babbage (known as the father of the computer) invented an early
calculator, which was named as the 'difference engine'. Later in 1837, he
introduced the first mechanical, general-purpose computer 'Analytical Engine'. Over
time, computers became powerful in performance and small in size.

Generations of Computer
There are five generations of the computer, which can be classified as below:
First Generation (1946 - 1959): During the first generation, computers were based
on electronic valves (Vacuum Tubes). Some popular computers of first-generation are
ENIAC, EDVAC, UNIVAC, etc.
Second Generation (1959 - 1965): During the second generation, computers were
based on Transistors. Some popular computers of second-generation are IBM 1400,
IBM 1620, IBM 7000 series, etc.

1
Third Generation (1965 - 1971): During the third generation, computers were
based on Integrated Circuits (ICs). Some popular computers of the third generation
are IBM 360, IBM 370, PDP, etc.
Fourth Generation (1971 - 1980): During the fourth generation, computers were
based on very large scale integrated (VLSI) circuits. Some popular computers of
fourth-generation are STAR 1000, CRAY-1, CRAY-X-MP, DEC 10, etc.
Fifth Generation (1980 - Present): The fifth generation is still ongoing. The
computers are based on multiple technologies, such as ultra large scale
integration (ULSI), artificial intelligence (AI), and parallel processing hardware.
The fifth generation of computers includes Desktop, Laptop, NoteBook, etc.
Classification of Computer
According to physical size, computers are classified into the following types:
Supercomputer: Supercomputers are the fastest and the most expensive type of
computer. They are large and require more space for installation. These types of
computers are mainly designed to perform massive data-based and complex tasks.
Supercomputers are capable enough to handle trillions of instructions at the same
time.
Mainframe Computer: Mainframe computers are comparatively smaller in size as
compared to supercomputers. However, they are not much small. These types of
computers are designed to perform hundreds or thousands of jobs at a time
simultaneously. These computers can handle heavy tasks, including complex
calculations and can store vast amounts of data. They are best suited for big
organizations such as banking, telecom, and educational sectors.
Microcomputer: Microcomputers are cheap in price and support multi-user platform.
These are the general-purpose computers designed to handle all the necessary tasks
of individual needs. Since they are comparatively slower than mainframe computers,
thereby are suitable for small organizations. They are best suited for internet café,
schools, universities, offices, etc. A microcomputer is also referred to as the 'Personal
Computer (PC)' in general life. Laptop and desktop are examples of microcomputers.
Minicomputer: Minicomputers are also referred to as Miniframe computers. These
are the midsize multiprocessing computer designed purposely for easy carry. These
types of computers are light-weight and can fit in a small space. They are suitable for
billing, accounting, education, and business purposes. Since these minicomputers are
easy to carry, they are the best option for those who need a computer while traveling.
Tablet PC, Notebooks, and cell phones are examples of minicomputers.
Workstation: Workstation is a powerful, single-user computer. A workstation is a
personal computer with a faster microprocessor, a massive amount of RAM, higher-
quality monitors, high graphic memory, etc. This is best suited for performing any
specific type of task professionally. According to the type of tasks, a workstation can
be referred to as a music workstation, graphic workstation, or engineering design
workstation. Most businesses and professionals use workstations for performing tasks
like animation, music creation, video editing, poster designs, data analysis and more.
Types of Computer
We can categorize computer in two ways: on the basis of data handling capabilities
and size.
On the basis of data handling capabilities, the computer is of three types:

o Analogue Computer
2
o Digital Computer
o Hybrid Computer

1) Analogue Computer
Analogue computers are designed to process analogue data. Analogue data is
continuous data that changes continuously and cannot have discrete values. We can
say that analogue computers are used where we don't need exact values always such
as speed, temperature, pressure and current.
Analogue computers directly accept the data from the measuring device without first
converting it into numbers and codes. They measure the continuous changes in
physical quantity and generally render output as a reading on a dial or
scale. Speedometer and mercury thermometer are examples of analogue
computers.
Advantages of using analogue computers:

o It allows real-time operations and computation at the same time and continuous
representation of all data within the rage of the analogue machine.
o In some applications, it allows performing calculations without taking the help of
transducers for converting the inputs or outputs to digital electronic form and
vice versa.
o The programmer can scale the problem for the dynamic range of the analogue
computer. It provides insight into the problem and helps understand the errors
and their effects.
Types of analogue computers:

o Slide Rules: It is one of the simplest types of mechanical analogue


computers. It was developed to perform basic mathematical calculations. It
is made of two rods. To perform the calculation, the hashed rod is slid to line up
with the markings on another rod.
o Differential Analysers: It was developed to perform differential
calculations. It performs integration using wheel-and-disc mechanisms to solve
differential calculations.
o Castle Clock: It was invented by Al-Jarazi. It was able to save programming
instructions. Its height was around 11 feet and it was provided with the display
of time, the zodiac, and the solar and lunar orbits. This device also could allow
users to set the length of the day as per the current season.

2) Digital Computer
Digital computer is designed to perform calculations and logical operations at high
speed. It accepts the raw data as input in the form of digits or binary numbers (0 and
1) and processes it with programs stored in its memory to produce the output. All
modern computers like laptops, desktops including Smartphones that we use at home
or office are digital computers.

3
Advantages of digital computers:
o It allows you to store a large amount of information and to retrieve it easily
whenever you need it.
o You can easily add new features to digital systems more easily.
o Different applications can be used in digital systems just by changing the
program without making any changes in hardware
o The cost of hardware is less due to the advancement in the IC technology.
o It offers high speed as the data is processed digitally.
o It is highly reliable as it uses error correction codes.
o Reproducibility of results is higher as the output is not affected by noise,
temperature, humidity, and other properties of its components.

3) Hybrid Computer
Hybrid computer has features of both analogue and digital computer. It is fast like
an analogue computer and has memory and accuracy like digital computers. It
can process both continuous and discrete data. It accepts analogue signals and
converts them into digital form before processing. For example, a processor is used in
petrol pumps that converts the measurements of fuel flow into quantity and price.

Advantages of using hybrid computers:


o Its computing speed is very high due to the all-parallel configuration of the
analogue subsystem.
o It produces precise and quick results that are more accurate and useful.
o It has the ability to solve and manage big equation in real-time.
o It helps in the on-line data processing.
Advantages of Using Computers
The following are the main advantages of using the computer:

o Computers can perform given tasks at incredible speed.


o Computers can perform the same task multiple times with the same accuracy.
o Computers allow doing several tasks simultaneously as they are best suited for
multitasking.
o Computers keep the stored data secure and inaccessible from unauthorized
users.
o Computers can automatically perform routine tasks with automation, making
humans available for more intelligent tasks.
o Cost-effective
The cost-effectiveness in computers means that it reduces paperwork.
Today most people prefer to work on a computer rather than paper, which saves them
both time and money. These all are pros of computers.

The Disadvantages of Using Computers


4
The following are the main disadvantages of using the computer:
o Computers cannot work on their own. They need instructions from humans to
complete tasks. Moreover, computers follow the given instructions blindly
without thinking about the outcomes.
o Computers need a power supply to work. Without a power supply, they are just
useless.
o Working on a computer continuously for a long period can cause several health
issues.
o Wastage of computers and their parts leave a negative impact on the
environment.
o Computers are taking human jobs in many sectors. They are replacing human
work and thus increasing unemployment.
Computer Software and Hardware
Hardware
The physical parts attached to a computer that form a whole computer are
called hardware or hardware components. There can be different types of hardware,
depending on the structure. Some most common hardware
include: mouse, keyboard, monitor, printer, etc. These are the parts that can be seen
and touched by humans.

Basic Parts of Computer


The essential components of the computer hardware can be defined as follows:
Input Unit: Input Units or devices are used to input the data or instructions into the
computers. Some most common input devices are mouse and keyword.
Output Unit: Output Units or devices are used to provide output to the user in the
desired format. The most popular examples of output devices are the monitor and the
printer.
Control Unit: As its name states, this unit is primarily used to control all the
computer functions and functionalities. All the components or devices attached to a
computer interact with each other through the control unit. In short, the control unit is
referred to as 'CU'.

5
Arithmetic Logic Unit: The arithmetic logic unit helps perform all the computer
system's arithmetic and logical operations. In short, the arithmetic logic unit is
referred to as 'ALU'.
Memory: Memory is used to store all the input data, instructions, and output data.
Memory usually has two types: Primary Memory and Secondary Memory. The
memory found inside the CPU is called the primary memory, whereas the memory
that is not the integral part of the CPU is called secondary memory.

The Monitor:
A monitor is the display unit of a computer on which the processed data, such as text,
images, etc., is displayed. It comprises a screen circuitry and the case which encloses
this circuitry. The monitor is also known as a visual display unit (VDU).
Types of Monitors:
1. CRT Monitor: It has cathode ray tubes which produce images in the form of
video signals. Its main components are electron gun assembly, deflection plate
assembly, glass envelope, fluorescent screen, and base.
2. LCD Monitor: It is a flat panel screen. It uses liquid crystal display technology
to produce images on the screen. Advanced LEDs have thin-film transistors with
capacitors and use active-matrix technology, which allows pixels to retain their
charge.
3. LED Monitor: It is an advanced version of an LCD monitor. Unlike an LCD
monitor, which uses cold cathode fluorescent light to backlight the display, it
has LED panels, each of which has lots of LEDs to display the backlight.
4. Plasma Monitor: It uses plasma display technology that allows it to produce
high resolutions of up to 1920 X 1080, wide viewing angle, a high refresh rate,
outstanding contrast ration, and more.

Keyboard:
It is the most important input device of a computer. It is designed to allow you input
text, characters, and other commands into a computer, desktop, tablet, etc. It comes

6
with different sets of keys to enter numbers, characters, and perform various other
functions like copy, paste, delete, enter, etc.
A keyboards is an input device through which users can input text, numbers, and
special characters. It is an external hardware device that is connected to the
computer. It serves as the user's most fundamental interface with a system.

Types of Keyboards:
1. QWERTY Keyboards
2. AZERTY Keyboards
3. DVORAK Keyboards

Mouse:
It is a small handheld device designed to control or moves the pointer (computer
screen's cursor) in a GUI (graphical user interface). It allows you to point to or select
objects on a computer's display screen. It is generally placed on a flat surface as we
need to move it smoothly to control the pointer. Types: Trackball mouse,
Mechanical Mouse, Optical Mouse, Wireless Mouse, etc.
A mouse can be wireless or wired. It is a portable pointing device that is used to
interact with objects on computer screens with the help of moving the cursor around
the screen.

Main functions of a mouse:


o Move the cursor: It is the main function of the mouse; to move the cursor on
the screen.

7
o Open or execute a program: It allows you to open a folder or document and
execute a program. You are required to take the cursor on the folder and double
click it to open it.
o Select: It allows you to select text, file, or any other object.
o Hovering: Hovering is an act of moving the mouse cursor over a clickable
object. During hovering over an object, it displays information about the object
without pressing any button of the mouse.
o Scroll: It allows you to scroll up or down while viewing a long webpage or
document.
Parts of a mouse:
o Two buttons: A mouse is provided with two buttons for right click and left click.
o Scroll Wheel: A wheel located between the right and left buttons, which is
used to scroll up and down and Zoom in and Zoom out in some applications like
AutoCAD.
o Battery: A battery is required in a wireless mouse.
o Motion Detection Assembly: A mouse can have a trackball or an optical
sensor to provide signals to the computer about the motion and location of the
mouse.

Software
Software, which is abbreviated as SW or S/W, is a set of programs that enables the
hardware to perform a specific task. All the programs that run the computer are
software. The software can be of three types: system software, application software,
and programming software.
1) System Software
The system software is the main software that runs the computer. When you turn on
the computer, it activates the hardware and controls and coordinates their
functioning. The application programs are also controlled by system software. An
operating system is an example of system software.

i) Operating System:
An operating system is the system software that works as an interface to enable the
user to communicate with the computer. It manages and coordinates the functioning
of hardware and software of the computer. The commonly used operating systems are
Microsoft Windows, Linux, and Apple Mac OS X.

Some other examples of system software include:

(i) BIOS: It stands for basic input output system. It is a type of system software,
which is stored in Read Only Memory (ROM) located on the motherboard.
However, in advanced computer systems, it is stored in flash memory. BIOS is

8
the first software that gets activated when you turn on your computer system. It
loads the drivers of the hard disk into memory as well as assists the operating
system to load itself into the memory.
(ii) Boot Program: Boot refers to starting up a computer. When you switch on the
computer, the commands in the ROM are executed automatically to load the
boot program into memory and execute its instructions.
(iii) An assembler: It plays the role of a converter as it receives basic computer
instructions and converts them into a pattern of bits. The processor uses these
bits to perform basic operations.
(iv) A device driver: This system software controls hardware devices connected to
a computer. It enables the computer to use the hardware by providing an
appropriate interface.

2) Application Software:
Application software is a set of programs designed to perform a specific task. It does
not control the working of a computer as it is designed for end-users. Accordingly,
they can be of different types such as:
o Word Processing Software: This software allows users to create, edit, format,
and manipulate the text and more. It offers lots of options for writing
documents, creating images, and more. For example, MS Word, WordPad,
Notepad, etc.
o Spreadsheet Software: It is designed to perform calculations, store data,
create charts, etc. It has rows and columns, and the data is entered in the cell,
which is an intersection of a row and column, e.g., Microsoft Excel.
o Multimedia Software: These software are developed to perform editing of
video, audio, and text. It allows you to combine texts, videos, audio, and
images. For example, VLC player, Window Media Player, etc.
o Enterprise Software: These software are developed for business operational
functions. It can be used for accounting, billing, order processing and more. For
example, CRM (Customer Relationship Management), BI (Business Intelligence),
ERP (Enterprise Resource Planning), etc.

3) Programming Software:
It is a set or collection of tools that help developers in writing other software or
programs. It assists them in creating, debugging, and maintaining software or
programs or applications. We can say that these are facilitator software that helps
translate programming language such as Java, C++, Python, etc., into machine
language code. So, it is not used by end-users. For example, compilers, linkers,
debuggers, interpreters, text editors, etc.

Hardware vs. software


Hardware describes the physical parts of the computer or its delivery mechanisms
that hold and execute the software's written instructions. The intangible component of
9
the system software enables the user to communicate with the hardware and give
commands to perform specific tasks. Computer software includes:

o OS and associated tools;


o Applications that regulate particular computer operations
o Programs that generally operate on data provided by the user
Virtual keyboards are not physical keyboards; therefore, they are also considered
software on mobile devices and laptop computers.
The software must be developed to function properly with the hardware because they
both are necessary for a computer to create usable output. Also, they depend on each
other.
If any system has malware or malicious software, such as worms, spyware, viruses,
and Trojan horses, they can have a significant impact on software and the operating
system of a system. Malware, however, has no effect on hardware.

Operating System: Definition and Function


An Operating System can be defined as an interface between user and
hardware. It is responsible for the execution of all the processes, Resource
Allocation, CPU management, File Management and many other tasks.

The purpose of an operating system is to provide an environment in which a user can


execute programs in convenient and efficient manner.

Structure of a Computer System


A Computer System consists of:

o Users (people who are using the computer)


o Application Programs (Compilers, Databases, Games, Video player, Browsers, etc.)
o System Programs (Shells, Editors, Compilers, etc.)
o Operating System ( A special program which acts as an interface between user and
hardware )
o Hardware ( CPU, Disks, Memory, etc)

10
What does an Operating system do?
1. Process Management
2. Process Synchronization
3. Memory Management
4. CPU Scheduling
5. File Management
6. Security

PRINCIPLES & METHODS OF DATA PROCESSING


Data processing means collecting raw data and translating it into usable information.
The raw data is collected, filtered, sorted, processed, analyzed, stored, and
then presented in a readable format. It is usually performed in a step-by-step process
by a team of data scientists and data engineers in an organization.

The data processing is carried out automatically or manually. Nowadays, most data is
processed automatically with the help of the computer, which is faster and gives
accurate results. Thus, data can be converted into different forms. It can be graphic as
well as audio ones. It depends on the software used as well as data processing
methods.
After that, the data collected is processed and then translated into a desirable form as
per requirements, useful for performing tasks. The data is acquired from Excel files,
databases, text file data, and unorganized data such as audio clips, images,
GPRS, and video clips.
Data processing is crucial for organizations to create better business strategies and
increase their competitive edge. By converting the data into a readable format
like graphs, charts, and documents, employees throughout the organization can
understand and use the data.
The processing of data largely depends on the following things, such as:

11
o The volume of data that needs to be processed.
o The complexity of data processing operations.
o Capacity and inbuilt technology of respective computer systems.
o Technical skills and Time constraints.
Stages of Data Processing
The data processing consists of the following six stages.

1. Data Collection
The collection of raw data is the first step of the data processing cycle. The raw data
collected has a huge impact on the output produced. Hence, raw data should be
gathered from defined and accurate sources so that the subsequent findings are valid
and usable. Raw data can include monetary figures, website cookies, profit/loss
statements of a company, user behavior, etc.
2. Data Preparation
Data preparation or data cleaning is the process of sorting and filtering the raw data
to remove unnecessary and inaccurate data. Raw data is checked for errors,
duplication, miscalculations, or missing data and transformed into a suitable form for
further analysis and processing. This ensures that only the highest quality data is fed
into the processing unit.
3. Data Input
In this step, the raw data is converted into machine-readable form and fed into the
processing unit. This can be in the form of data entry through a keyboard, scanner, or
any other input source.
4. Data Processing
In this step, the raw data is subjected to various data processing methods using
machine learning and artificial intelligence algorithms to generate the desired output.
This step may vary slightly from process to process depending on the source of data
being processed (data lakes, online databases, connected devices, etc.) and the
intended use of the output.
5. Data Interpretation or Output
The data is finally transmitted and displayed to the user in a readable form like
graphs, tables, vector files, audio, video, documents, etc. This output can be stored
and further processed in the next data processing cycle.
6. Data Storage
The last step of the data processing cycle is storage, where data and metadata are
stored for further use. This allows quick access and retrieval of information whenever
needed. Effective proper data storage is necessary for compliance with GDPR (data
protection legislation).

Why Should We Use Data Processing?


In the modern era, most of the work relies on data, therefore collecting large amounts
of data for different purposes like academic, scientific research, institutional use,
personal and private use, commercial purposes, and lots more. The processing of this
data collected is essential so that the data goes through all the above steps and gets
sorted, stored, filtered, presented in the required format, and analyzed.

12
The amount of time consumed and the intricacy of processing will depend on the
required results. In situations where large amounts of data are acquired, the necessity
of processing to obtain authentic results with the help of data processing in data
mining and data processing in data research is inevitable.

Methods of Data Processing


There are three main data processing methods, such as:

1. Manual Data Processing


Data is processed manually in this data processing method. The entire procedure of
data collecting, filtering, sorting, calculation and alternative logical operations is all
carried out with human intervention without using any electronic device or
automation software. It is a low-cost methodology and does not need very many tools.
However, it produces high errors and requires high labor costs and lots of time.
2. Mechanical Data Processing
Data is processed mechanically through the use of devices and machines. These can
include simple devices such as calculators, typewriters, printing press, etc. Simple
data processing operations can be achieved with this method. It has much fewer
errors than manual data processing, but the increase in data has made this method
more complex and difficult.
3. Electronic Data Processing
Data is processed with modern technologies using data processing software and
programs. The software gives a set of instructions to process the data and yield
output. This method is the most expensive but provides the fastest processing speeds
with the highest reliability and accuracy of output.
Types of Data Processing Techniques
There are different types of data processing based on the source of data and the
steps taken by the processing unit to generate an output. There is no one size fits all
method that can be used for processing raw data.

1. Batch Processing: In this type of data processing, data is collected and


processed in batches. It is used for large amounts of data. For example, the
payroll system.

2. Single User Programming Processing: It is usually done by a single person


for his personal use. This technique is suitable even for small offices.
13
3. Multiple Programming Processing: This technique allows simultaneously
storing and executing more than one program in the Central Processing Unit
(CPU). Data is broken down into frames and processed using two or more CPUs
within a single computer system. It is also known as parallel processing. Further,
the multiple programming techniques increase the respective computer's
overall working efficiency. A good example of multiple programming processing
is weather forecasting.

4. Real-time Processing: This technique facilitates the user to have direct


contact with the computer system. This technique eases data processing. This
technique is also known as the direct mode or the interactive mode technique
and is developed exclusively to perform one task. It is a sort of online
processing, which always remains under execution. For example, withdrawing
money from ATM.

5. Online Processing: This technique facilitates the entry and execution of data
directly; so, it does not store or accumulate first and then process. The
technique is developed to reduce the data entry errors, as it validates data at
various points and ensures that only corrected data is entered. This technique is
widely used for online applications. For example, barcode scanning.

6. Time-sharing Processing: This is another form of online data processing that


facilitates several users to share the resources of an online computer system.
This technique is adopted when results are needed swiftly. Moreover, as the
name suggests, this system is time-based. Following are some of the major
advantages of time-sharing processing, such as:

o Several users can be served simultaneously.


o All the users have an almost equal amount of processing time.
o There is a possibility of interaction with the running programs.
7. Distributed Processing: This is a specialized data processing technique in
which various computers (located remotely) remain interconnected with a single
host computer making a network of computers. All these computer systems
remain interconnected with a high-speed communication network. However, the
central computer system maintains the master database and monitors
accordingly. This facilitates communication between computers.
Examples of Data Processing
Data processing occurs in our daily lives whether we may be aware of it or not. Here
are some real-life examples of data processing, such as:

o Stock trading software that converts millions of stock data into a simple graph.

14
o An e-commerce company uses the search history of customers to recommend
similar products.
o A digital marketing company uses demographic data of people to strategize
location-specific campaigns.
o A self-driving car uses real-time data from sensors to detect if there are
pedestrians and other cars on the road.

PRINCIPLES OF DATA MANAGEMENT


Various Methods of Data Collection
Data collection is the process of gathering, measuring, and analyzing accurate data
from a variety of relevant sources to find answers to research problems, answer
questions, evaluate outcomes, and forecast trends and probabilities. Accurate data
collection is necessary to make informed business decisions, ensure quality
assurance, and keep research integrity.

1)Primary Data Collection:


Primary data collection involves the collection of original data directly from the source
or through direct interaction with the respondents. This method allows researchers to
obtain firsthand information specifically tailored to their research objectives. There
are various techniques for primary data collection, including:
a. Surveys and Questionnaires: Researchers design structured questionnaires or
surveys to collect data from individuals or groups. These can be conducted through
face-to-face interviews, telephone calls, mail, or online platforms.
b. Interviews: Interviews involve direct interaction between the researcher and the
respondent. They can be conducted in person, over the phone, or through video
conferencing. Interviews can be structured (with predefined questions), semi-
structured (allowing flexibility), or unstructured (more conversational).
c. Observations: Researchers observe and record behaviors, actions, or events in
their natural setting. This method is useful for gathering data on human behavior,
interactions, or phenomena without direct intervention.
d. Experiments: Experimental studies involve the manipulation of variables to
observe their impact on the outcome. Researchers control the conditions and collect
data to draw conclusions about cause-and-effect relationships.
e. Focus Groups: Focus groups bring together a small group of individuals who
discuss specific topics in a moderated setting. This method helps in understanding
opinions, perceptions, and experiences shared by the participants.
2. Secondary Data Collection:
Secondary data collection involves using existing data collected by someone else for a
purpose different from the original intent. Researchers analyze and interpret this data
to extract relevant information. Secondary data can be obtained from various sources,
including:
a. Published Sources: Researchers refer to books, academic journals, magazines,
newspapers, government reports, and other published materials that contain relevant
data.
15
b. Online Databases: Numerous online databases provide access to a wide range of
secondary data, such as research articles, statistical information, economic data, and
social surveys.
c. Government and Institutional Records: Government agencies, research
institutions, and organizations often maintain databases or records that can be used
for research purposes.
d. Publicly Available Data: Data shared by individuals, organizations, or
communities on public platforms, websites, or social media can be accessed and
utilized for research.
e. Past Research Studies: Previous research studies and their findings can serve as
valuable secondary data sources. Researchers can review and analyze the data to
gain insights or build upon existing knowledge.

Methods of Data Preparation


What is Data Preparation?

Data preparation is defined as a gathering, combining, cleaning, and


transforming raw data to make accurate predictions in Machine learning
projects.

Data preparation is also known as data "pre-processing," "data wrangling," "data


cleaning," "data pre-processing," and "feature engineering." It is the later stage of the
machine learning lifecycle, which comes after data collection.

The data preparation pipeline consists of the following steps


1. Access the data.
2. Ingest (or fetch) the data.
3. Cleanse the data.
4. Format the data.
5. Combine the data.
6. And finally, analyze the data.
Access
There are many sources of business data within any organization. Examples include
endpoint data, customer data, marketing data, and all their associated repositories.
This first essential data preparation step involves identifying the necessary data and
its repositories. This is not simply identifying all possible data sources and
repositories, but identifying all that are applicable to the desired analysis.
Ingest
Once the data is identified, it needs to be brought into the analysis tools. The data will
likely be some combination of structured and semi-structured data in different types
of repositories. Importing it all into a common repository is necessary for the
subsequent steps in the pipeline.
Cleanse
Cleansing the data ensures that the data set can provide valid answers when the data
is analyzed.

16
There are many different problems possible with the ingested data. There could be
missing values, out-of-range values, nulls, and whitespaces that obfuscate values, as
well as outlier values that could skew analysis results.
Format
Once the data set has been cleansed; it needs to be formatted. This step includes
resolving issues like multiple date formats in the data or inconsistent abbreviations. It
is also possible that some data variables are not needed for the analysis and should
therefore be deleted from the analysis data set. This is another data preparation step
that will benefit from automation.
Combine
When the data set has been cleansed and formatted, it may be transformed by
merging, splitting, or joining the input sets. Once the combining step is complete, the
data is ready to be moved to the Data Warehouse staging area. Once data is loaded
into the staging area, there is a second opportunity for validation.
Analyze
Once the analysis has begun, changes to the data set should only be made with
careful consideration. During analysis, algorithms are often adjusted and compared to
other results.

Data Validation Techniques


Data validation is a critical aspect of data management. It ensures that data entered
into a system is accurate, consistent, and meets the standards set for that specific
system.

What is Data Validation?


Data validation refers to the process of ensuring the accuracy and quality of
data. It is implemented by building several checks into a system or report to
ensure the logical consistency of input and stored data.
Several common techniques can be used to validate data, including data type
validation, range validation, length validation, format validation, and check digit
validation.
 Data Type Validation: This technique checks if the data entered into the
system is of the correct data type, such as a string, integer, or date.
 Range Validation: This technique checks if the data entered into the system
falls within a specific range of values, such as a customer's age between 18
and 65 years old.
 Length Validation: This technique checks if the data entered into the system
has a specific length, such as a password with at least 8 characters.
 Format Validation: This technique checks if the data entered into the system
follows a specific format, such as a date in the MM/DD/YYYY format.
 Check Digit Validation: This technique uses a mathematical algorithm to
validate data, such as a checksum for a credit card number.

Data Transmission Methods


Data transmission in a computer happens when bits are sent from one system to
another either through radio waves (wirelessly) or through cables.
17
The rate at which data is transferred is known as the bit rate, measured in bits per
second. This is the number of bits that can be transferred over a certain period of
time. The more bits you can transfer, the faster your connection is.
Types of Data Transmission
Key Terms:
 Serial – sending data one bit at a time
 Parallel – sending multiple bits simultaneously
Serial Data Transmission
Serial transmission only uses a single wire to transfer the data bits. Each bit; whether
it is a 1 or a 0 is sent one after the other and is a lot cheaper than its parallel
counterpart. Serial transmission can be used to send the data over long distances as
is used more frequently than parallel.
Let us consider an example:
If we have an 8-bit byte of data, this is what it might look like if we sent it over serial
transmission.

Parallel Data Transmission


Parallel transmission uses several wires to send the bits simultaneously. In our
previous example instead of sending 8 bits one after the other, it may use 8 separate
wires that can each send 1 bit. A much faster transmission rate can be achieved using
this type of method.

Modes of transmission
Key Terms:
 Simplex – sending data one direction only at a time
 Duplex – sending data in both directions at the same time
 Half–duplex – sending data in both directions but on one direction at a time
Simplex
In this transmission mode, data is only sent in one direction only. Think of it like your
television receiving a broadcast.
Full-Duplex
In this transmission mode, data is sent in both directions at the same time. Think of it
like a normal telephone call using your mobile phone
Half-Duplex
In this transmission mode, data is only sent in one direction only. Think of it like a
walkie-talkie radio where you can only speak one at a time to each other.

18
The Need for Data Protection
Data protection is important, since it prevents the information of an organization from
fraudulent activities, hacking, phishing, and identity theft. Any organization that wants
to work effectively needs to ensure the safety of their information by implementing a
data protection plan.
Ultimately, the key principle and importance of data protection is safeguarding and
protecting data from different threats and under different circumstances. .

Key Elements of Data Protection


One very important data protection model is the CIA triad, where the three letters of
the name represent the three elements of data protection: confidentiality, integrity,
and availability. This model was developed to help individuals and organizations
develop a holistic approach to data protection. The three elements are defined as
follows:
 Confidentiality: The data is retrieved only by authorized operators with
appropriate credentials.
 Integrity: All the data stored within an organization is reliable, precise, and not
subject to any unjustified changes.
 Availability: The data stored is safely and readily available whenever needed.

Data Protection Best Practices


There are different data protection management practices. Some of the most
commonly used include:

 Data loss prevention (DLP): A set of tools and processes used to secure data
from theft, loss, misuse, deletion, or other illegal or inappropriate forms of
contact
 Firewalls: Tools used for monitoring and filtering the network traffic to ensure
data is transferred or accessed only by authorized users
 Encryption: Altering the content of data based on an algorithm that can be
reversed only with the right encryption password or key.
 Data erasure: Deleting data that is no longer needed or relevant.

19
 Data backups: This entails making multiple copies of data in different storage
devices or media. Such backup plans may include a separate physical disk or
cloud.

DATA COMMUNICATION AND ITS APPLICATIONS


Data communication is the electronic exchange of data between two devices across a
communication channel like a wire pair cable or Fiber optics.
In a computer, Data communication allows electronic or digital data to be sent
between two or more devices regardless of their geographical location, transmission
medium, or data substance.
The message, sender, receiver, transmission medium, and protocol are all crucial
elements in data communication.
Data communication has four critical characteristics that are as follows:
 Delivery  Accuracy  Timeliness  Jitter

► Delivery: Data must be sent in the correct order from the source device to the
correct destination.

► Accuracy: The information must be supplied without errors. The data should be
retransmitted if there is any inaccuracy during transmission.

► Timeliness: Data must be given within the timeframe provided. The data that was
given late has become unusable.

► Jitter: Jitter is caused by an uneven or unexpected delay in the packet arrival time.
Five (5) major Components of Data Communication are as follows:
 The Message
 The sender
 Recipient
 Medium of Transmission
 Etiquette

► 1. The Message
The data or information being sent from the sender to the receiver is referred to as a
message. It could be made up of text, images, music, video, graphics, or photos,
among other things.

► 2. The sender
The sender is a device that generates and sends messages. Text, numbers, photos,
graphics, music, video, and other media may be used to convey the message. A
sender is sometimes referred to as a source, transmitter, or node. In most data
transmission systems, the computer functions as a transmitter.

20
► 3. Recipient
The transmitter sends a message to the receiver, which is a device that receives it.
It’s also known as a sink. The receiver is usually located somewhere other than the
sender. A computer, printer, or another computer-related device can be used. In
addition, the receiver must be able to accept the message.

► 4. Medium of Transmission
It is the physical road or channel that the communication travels from the sender to
the receiver. The communication medium can be wired, such as twisted-pair wire,
coaxial cable, or fiber-optic cable, or wireless, such as lasers, radio waves, or
microwaves.
A medium is a physical conduit or path by which a message is transmitted from the
sender to the receiver. It’s required since it connects the sender and recipient.
Twisted-pair wire, coaxial cable, fiber-optic cable, or wireless technologies such as
lasers, radio waves, and microwaves may be used as the medium.

► 5. Etiquette (Protocol) in Data Communication


A protocol is a set of instructions for transmitting data between computers. These
protocols define how a communication channel is established, how information is
delivered, and how errors are recognized and repaired during the data communication
process.
Electronic Communication in Business
According to Bovee & Others, “Electronic communications is the transmission of
information using advanced techniques such as computer modems, facsimile
machines, voice mail, electronic mail, teleconferencing, video-cassettes and private
television network.”
Some widely used electronic communication technologies or media of electronic
communication are discussed below-

 Telephone: The most commonly and most widely used electronic device of
communication. By telephone, people can transmit information orally within a
minute. In most of the cases, it is the easiest and less expensive way of
communicating distance people.
 Telex: Telex is an important device of modern communication technology.
Under this system, a tele-printer is used by which information can be
communicated form one place to another with the help of a machine. The tele-
printer consists of two parts keyboard transmitter and receiver. When a
message is to be sent, the typist presses a button, waits for the dial tone, dials
the number desired and types the massage. The message is typed on a small
strip of paper at the receiver’s end as it is typed in the originating office. This is
one of the quickest and most accurate methods of exchanging written
messages.
 Facsimile or Fax: The use of fax is gradually increasing for transmitting visual
materials such as picture, diagrams, illustrations etc. here, the fax machine is
connected with a telephonic. The document to be transmitted is fed through the
machine, then it is electronically scanned and signals are transmitted to the
receiving end where an identical copy of the document is reproduced on a blank
sheet of paper by the receiving machine.
21
 Electronic Mail or E-Mail: E-mail is one of the most widely used and most
popular methods of modern communication system. E-mail involves sending
message via telecommunicating links. Here two computer terminals are
connected together on network to transfer messages from one to another.
 Voice Mail or V-Mail: Voice mail is a form of e-mail. It is used to send the
voice of the sender instead of sending written massage to the receiver.
 Tele-Text: Tele-Text is an electronic device of broadcasting written massages
through television.
 Teleconferencing: Under teleconferencing system people staying at different
places can hold talks or meetings over telephone. Here everyone involved in the
meeting is able to hear each other and can share information with one another
as if they were all in one room.
 Videoconferencing: Videoconferencing is the latest version of
teleconferencing system. Word Processor: A word processor in an electronic
device where a computer is combined with a typewriter. It can greatly simplify
the job of written communication. Typing skill, basic computer literacy and
word processing software are essential for using a word processor.
 Internet: Internet is the latest and most amazing development that has
changed the way of live regarding communication.
 Multimedia: Multimedia is an excellent invention for upgrading
the communication system. Multimedia is a combination of many media
brought together to transfer messages. These media can include graphics,
photo, music, voice, text and animation.

Assignment
What are the advantages and disadvantages of:
(i) Internet
(ii) E-mail
(iii) E-commerce
(iv) E-banking

22

You might also like