Defining Information Systems
Defining Information Systems
Almost all programs in business require individuals to have knowledge in something called
information systems. But what exactly does that term mean?
EXAMPLES OF DATA
Almost all software programs require data to do anything useful. For example, if you are editing
a document in a word processor such as Microsoft Word, the document you are working on is the
data. The word processing software can manipulate the data: create a new document, duplicate a
document, or modify a document. Some other examples of data are: an MP3 music file, a video
file, a spreadsheet, a web page, and an e-book. In some cases, such as with an e-book, you may
only have the ability to read the data.
DATABASES
The goal of many information systems is to transform data into information in order to generate
knowledge that can be used for decision making. In order to do this, the system must be able to
take data, put the data into context, and provide tools for aggregation and analysis. A database is
designed for just such a purpose. A database is an organized collection of related information. It
is an organized collection, because in a database, all data is described and associated with other
data. All information in a database should be related as well; separate databases should be
created to manage unrelated information. For example, a database that contains information
about students should not also hold information about company stock prices. Databases are not
always digital – a filing cabinet, for instance, might be considered a form of database. For the
purposes of this text, we will only consider digital databases.
RELATIONAL DATABASES
Databases can be organized in many different ways, and thus take many forms. The most popular
form of database today is the relational database. Popular examples of relational databases are
Microsoft Access, MySQL, and Oracle. A relational database is one in which data is organized
into one or more tables. Each table has a set of fields, which define the nature of the data stored
in the table. A record is one instance of a set of fields in a table. To visualize this, think of the
records as the rows of the table and the fields as the columns of the table. In the example below,
we have a table of student information, with each row representing a student and each column
representing one piece of information about the student.
In a relational database, all the tables are related by one or more fields, so that it is possible to
connect all the tables in the database through the field(s) they have in common. For each table,
one of the fields is identified as a primary key. This key is the unique identifier for each record in
the table.
NORMALIZATION
When designing a database, one important concept to understand is normalization. In simple
terms, to normalize a database means to design it in a way that: 1) reduces duplication of data
between tables and 2) gives the table as much flexibility as possible. In the Student Clubs
database design, the design team worked to achieve these objectives. For example, to track
memberships, a simple solution might have been to create a member's field in the Clubs table
and then just list the names of all of the members there. However, this design would mean that if
a student joined two clubs, then his or her information would have to be entered a second time.
Instead, the designers solved this problem by using two tables: Students and Memberships. In
this design, when a student joins their first club, we first must add the student to the students
table, where their first name, last name, e-mail address, and birth year are entered. This addition
to the students table will generate a student ID. Now we will add a new entry to denote that the
student is a member of a specific club. This is accomplished by adding a record with the student
ID and the club ID in the Memberships table. If this student joins a second club, we do not have
to duplicate the entry of the student’s name, e-mail, and birth year; instead, we only need to
make another entry in the Memberships table of the second club’s ID and the student’s ID. The
design of the Student Clubs database also makes it simple to change the design without major
modifications to the existing structure. For example, if the design team were asked to add
functionality to the system to track faculty advisors to the clubs, we could easily accomplish this
by adding a Faculty Advisors table (similar to the students table) and then adding a new field to
the Clubs table to hold the Faculty Advisor ID.
DATA TYPES
When defining the fields in a database table, we must give each field a data type. For example,
the field Birth Year is a year, so it will be a number, while First Name will be text. Most modern
databases allow for several different data types to be stored. Some of the more common data
types are listed here:
i. Text: for storing non-numeric data that is brief, generally under 256 characters. The
database designer can identify the maximum length of the text.
ii. Number: for storing numbers. There are usually a few different number types that can be
selected, depending on how large the largest number will be.
iii. Yes/No: a special form of the number data type that is (usually) one byte long, with a 0
for “No” or “False” and a 1 for “Yes” or “True”.
iv. Date/Time: a special form of the number data type that can be interpreted as a number or
a time.
v. Currency: a special form of the number data type that formats all values with a currency
indicator and two decimal places.
vi. Paragraph Text: this data type allows for text longer than 256 characters.
vii. Object: this data type allows for the storage of data that cannot be entered via keyboard,
such as an image or a music file.
There are two important reasons that we must properly define the data type of a field. First, a
data type tells the database what functions can be performed with the data. For example, if we
wish to perform mathematical functions with one of the fields, we must be sure to tell the
database that the field is a number of data type. So, if we have, say, a field storing birth year, we
can subtract the number stored in that field from the current year to get age.
The second important reason to define data type is so that the proper amount of storage space is
allocated for our data. For example, if the First Name field is defined as a text (50) data type, this
means fifty characters are allocated for each first name we want to store. However, even if the
first name is only five characters long, fifty characters (bytes) will be allocated. While this may
not seem like a big deal, if our table ends up holding 50,000 names, we are allocating 50 *
50,000 = 2,500,000 bytes for storage of these values. It may be prudent to reduce the size of the
field, so we do not waste storage space.
Enterprise Databases
A database that can only be used by a single user at a time is not going to meet the needs of most
organizations. As computers have become networked and are now joined worldwide via the
Internet, a class of database has emerged that can be accessed by two, ten, or even a million
people. These databases are sometimes installed on a single computer to be accessed by a group
of people at a single location. Other times, they are installed over several servers worldwide,
meant to be accessed by millions. These relational enterprise database packages are built and
supported by companies such as Oracle, Microsoft, and IBM. The open-source MySQL is also
an enterprise database.
As stated earlier, the relational database model does not scale well. The term scale here refers to
a database getting larger and larger, being distributed on a larger number of computers connected
via a network. Some companies are looking to provide large-scale database solutions by moving
away from the relational model to other, more flexible models. For example, Google now offers
the App Engine Datastore, which is based on NoSQL. Developers can use the App Engine
Datastore to develop applications that access data from anywhere in the world. Amazon.com
offers several database services for enterprise use, including Amazon RDS, which is a relational
database service, and Amazon DynamoDB, a NoSQL enterprise solution.
Data Warehouse
As organizations have begun to utilize databases as the centerpiece of their operations, the need
to fully understand and leverage the data they are collecting has become more and more
apparent. However, directly analyzing the data that is needed for day-to-day operations is not a
good idea; we do not want to tax the operations of the company more than we need to. Further,
organizations also want to analyze data in a historical sense: How does the data we have today
compare with the same set of data this time last month, or last year? From these needs arose the
concept of the data warehouse. The concept of the data warehouse is simple: extract data from
one or more of the organization’s databases and load it into the data warehouse (which is itself
another database) for storage and analysis. However, the execution of this concept is not that
simple. A data warehouse should be designed so that it meets the following criteria:
• It uses non-operational data. This means that the data warehouse is using a copy of data
from the active databases that the company uses in its day-to-day operations, so the data
warehouse must pull data from the existing databases on a regular, scheduled basis.
• The data is time-variant. This means that whenever data is loaded into the data
warehouse, it receives a time stamp, which allows for comparisons between different
time periods.
• The data is standardized. Because the data in a data warehouse usually comes from
several different sources, it is possible that the data does not use the same definitions or
units. For example, our Events table in our Student Clubs database lists the event dates
using the mm/dd/yyyy format (e.g., 01/10/2013). A table in another database might use
the format yy/mm/dd (e.g., 13/01/10) for dates. In order for the data warehouse to match
up dates, a standard date format would have to be agreed upon and all data loaded into
the data warehouse would have to be converted to use this standard format. This process
is called extraction-transformation-load (ETL).
There are two primary schools of thought when designing a data warehouse: bottom-up and top-
down. The bottom-up approach starts by creating small data warehouses, called data marts, to
solve specific business problems. As these data marts are created, they can be combined into a
larger data warehouse. The Topdown approach suggests that we should start by creating an
enterprise-wide data warehouse and then, as specific business needs are identified, create smaller
data marts from the data warehouse.