Database Management Systems Versus File Management Systems
Database Management Systems Versus File Management Systems
Advantages Disadvantages
Greater flexibility Difficult to learn
Packaged separately from the operating
system (i.e. Oracle, Microsoft Access,
Good for larger databases
Lotus/IBM Approach, Borland Paradox,
Claris FileMaker Pro)
Greater processing power Slower processing speeds
Fits the needs of many medium to large-
Requires skilled administrators
sized organizations
Storage for all relevant data Expensive
Provides user views relevant to tasks
performed
Ensures data integrity by managing
transactions (ACID test = atomicity,
consistency, isolation, durability)
Supports simultaneous access
Enforces design criteria in relation to
data format and structure
Provides backup and recovery controls
Advanced security
• Data storage, retrieval, and update (while hiding the internal physical
implementation details)
• A user-accessible catalog
• Transaction support
• Concurrency control services (multi-user update functionality)
• Recovery services (damaged database must be returned to a consistent
state)
• Authorization services (security)
• Support for data communication Integrity services (i.e. constraints)
• Services to promote data independence
• Utility services (i.e. importing, monitoring, performance, record deletion,
etc.)
The components to facilitate the goals of a DBMS may include the following:
• Query processor
• Data Manipulation Language preprocessor
• Database manager (software components to include authorization control,
command processor, integrity checker, query optimizer, transaction
manager, scheduler, recovery manager, and buffer manager)
• Data Definition Language compiler
• File manager
• Catalog manager
say you go the flat file route....(maybe i'll start a new thread)
#:1579018
I'm curious as to how many files in a single directory have you
guys stored in the past?
I mean, can you store 1 million files in a folder on a windows
server? unix server?
Will file access times differ between unix and windows2003?
Lord Majestic 2:46 am on Sep. 7, 2005 (utc 0)
I mean, can you store 1 million files in a folder on a windows
#:1579019 server? unix server?
Possible, but this is a very bad idea to have so many files -- I
know, I have got 130,000+ files in one directory :o
physics 3:04 am on Sep. 7, 2005 (utc 0)
I use flat files to cache database output for data that doesn't
#:1579021 change very often.
I pull the data out of the database and store it in a flat file.
It is quicker, especially if you have a lot of processing to do (if
you're using joins for example) whenever you retreive the data.
Hester 9:03 am on Sep. 15, 2005 (utc 0)
#:1579022
Hmm, this forum is actually built on flat files!
This is first of all assuming that the code that you write is
'more efficient' than the code that the developers of the
database software have written. Plus you have to take the
time to write all of those functions and debug them. Also,
there are other things to consider such as the fact that if you
have tons of data stored on a hard disk in all different files
then every time you want to access a file a file handle has to
be opened at some random point on the disk, etc. With a
database it's stored in a 'central' location and the handle is
open... There are also all sorts of creature comforts like the
MySQL slow query log and things like mytop so you can keep
track of what queries might be slowing your system down. I
recently wrote an application that initially used flat files, then
switched to MySQL, the speedup was immediately evident.
Finally, yes many sites are based on flat files but they would
probably be better off with an 'advanced' database system.
Hester 3:39 pm on Sep. 16, 2005 (utc 0)
#:1579026
Only one file to backup too.
Pm's Explanation
Pm: I chose flat files to store PmWiki pages because I haven't seen any real
advantages of using a database, and there are definitely some disadvantages.
For the standard operations (view, edit, page revisions), holding the
information in flat files is clearly faster than accessing them in a database, and
with page caching abilities (coming soon) it'll be even faster. The only
operations that really benefit are searches, but I've always believed that for
fast, flexible search capabilities it's much better to use existing search programs
such as ht://Dig or Google over reinventing another search engine. PmWiki's
Site.Search is functional/fast enough for most purposes, and if more
performance is needed it's just better to switch to a real search engine.
Indeed, as of January 2004 the Wikipedia uses a MySQL database to store its
190K+ entries, but even with the database Wikipedia has disabled its online
search because of performance issues and just forwards search queries directly to
Google.
And there are big disadvantages to using a database -- with a database we'd have
to write a bunch of "administrative" tools/scripts to handle things such as mass
page deletions in the database, backups/restores of the pages, recovering pages
that have been wrongly deleted, etc. Much of that administrative programming
overhead is eliminated by using a flat file system, as admins can use existing tools
(FTP clients, web-based file/directory managers, shell commands). They are
already comfortable with the administrative tools. It's also much easier to build
sophisticated and customized page management tools and scripts for specialized
applications.
Finally, PmWiki is already structured such that the flat file structure can be easily
replaced by a database if it ever proves necessary. However, even PmWiki sites
with more than 40 000 pages function well in a flat file system without any
noticeable performance problems.
PmWiki supports the ability to subdivide the wiki.d/ directory into separate
subdirectories for each group, avoiding the "too large" directory problem. Check
out the Cookbook:PerGroupSubDirectories for more information.
Comments:
• Flat files are indeed much more easy to manage and my experience shows
that there is no problem at all for PmWiki. Still I had problems convincing
my boss using PmWiki since it is not using a "real" database. Ever thought
of using subdirectories for each group like in Uploads? There are known
issues on Solaris for directories containing more than 20.000 files. Uli
Re-enable the link index and run a few backlink searches (even if they
time out). PmWiki will incrementally build the link index. Once the link
index is built, everything will be fast and there won't be a big cost in
keeping the link index up-to-date. --Pm
• Another BIG advantage of flat files is that they are easy to edit directly. --
Babak
o Exactly! I know many scenarios where data-loss, caused by
hardware or transfer failure (storage medium crashes, power
dropouts and the likes), was easy to fix by simply using an editor on
the (flatfile) server's commandline and changing back what was
causing errors. I've never been able to do this with similar ease for
MySQL (and in such cases hate my job). -- SomeSysAdmin
• Maybe the reason flat files work so well is that a file system IS a
hierarchical database -- William
• Is a database more secure? That extra password protection needed to
access MySQL databases must mean something... Right? -- Xen
o Then why have no sites running PMwiki with flatfiles (that I know
of) ever been compromised? ;-) -- Julius
o If you can get access to PmWiki's flat files, you could also get access
to the php script containing the database password. So it doesn't
really provide any extra security. -- Andrew
Exactly. But one should never store the (non-flatfile)
database password containing php in a web-server accessible
location. Instead do an include and put the php somewhere
outside of the web/doc root. -- Julius
• I think the biggest disadvantage of using a flatfile system is that it take the
programmer too much time to design it and to maintain its stabilization,
especially when more and more new feathers are added into the project
and more and more requirements are put out. And this also add risk to
user's data, as bugs are more likely to be brought in by program update.
This also add difficulty to resolve compatibility problems. On the other
hand, flat file system does work more efficiently than database in most
situations. -- Adam
o I would have to disagree (with part of that). Programming
something to speak to (and read from) MySQL for example can be
just as painful, precisely because it is not your own code or design.
That can be a huge disadvantage: You never know when an updated
MySQL needs changed queries, when it will do what, if it will do
what you need and so on. -- Julius
• I think that this could be an endless debate because the line is often thin
between advantage and disadvantage, imho the safe bet will always be to
give the option and let people choose given their own needs, cheers. -- h3
o I don't think the line is that thin. With a separate database you will
always have a much bigger chance on crashes and downtimes. You
make yourself more dependent by needing yet another service to be
running and backupped separately etc. Just count the times you see
things don't work and give you MySQL errors online, I have rarely,
if at all, seen that occur with flatfile databases. -- Ben
Many people already have a copy of MySQL running, so that
isn't a problem. The mysql problems are from sites that are
too many/too slow of queries for their hardware. something
as simple as retrieving a wiki page isn't going to have trouble
like that.
More people don't have a copy of MySQL running. In
fact, I know more people who don't run it on their
servers, precisely because it is such a resource
monster for its purpose: Merely some text-file storage
system. -- Steven
• Flat file has a very important advantage -- you can diff and merge pages
with merge tools. With that you would be able to make more than one wiki
sites in all your computers and merge them periodically. I think lots of
people need this function. At least, I switch to dukowiki from mediawiki
just because of this.-- Edward
• Databases are always on top of a filesystem -- At last all of the "real"
databases store their data on a filesystem. They provide an abstraction
layer for purposes as e.g. authentification, transactions or only
convenience on different OS and have a common query syntax (SQL).
Therefore the performance issue relies mainly on following factors:
o Performance of the filesystem
o Efficient caching strategies (for data, queries, ...)
o Efficient internal file organization
o Efficient code (client and server)-- Heiko
• Most file systems map files to hard sectors on a disk. Databases offer a
level of virtualisation:
the sectors can be on any disk or server. Result is you can use one server/disk for
DB, another server for PHP and a third for web server. You can share out load
and get better overall performance even in very heavy usage. Of course that may
not be the goal of PmWiki, ;-). -- Peter
Well you can always use NFS if you want your files on another server. But in both
cases NFS or a DB, running them on another server is actually likely to increase
your latency and not necessarily increase your thoughput. The advantage of a
separate DB is more apparent when you need more than one client accessing it at
the same time, which, of course, you can do with NFS also, the DB might provide
better locking mechanisms but they are not likely to be important to pmwiki (not
writer heavy enough). How do you suggest running PHP on another server than
your web server? And, whatever your solution for this, wouldn't this also be
available without a DB also? Martin Fick
• Just to say. I prefer flatfiles in this case just becouse my home server is an
MMX, but isn't SQL servers loaded in memory? memory access time is
much slower than HD, not to mention the really old ones (my is
2GbATA100). Of course that not all the pages should be loaded on
memory all the time, but for the most accesed ones... Also, it is easier to
provide a single download file providing with all the wikidata for the user
who wants to have it offline. He will just need a way to read it... And my
third point is that it is better for a wiki becouse no JOIN is needed.
Logical vs. Physical File Organisation
The logical storage of a file is how it looks to the user when it is being processed.
A serial access file has data stored in the order in which it was written. New
data goes to the end of the file. To read a record from the file it is necessary to
read through all of the records from the start of the file until reaching the
required record.
A sequential access file has data stored in the order of a key field.
An indexed sequential file stores data in the order of a key field, but also has
an index holding the key field values and the address at which they are stored.
This allows both sequential and direct access.
A direct access file is one where any record can be accessed without having to
access other records first. This is also known as Random Access.
Records do not have to be put into any order when the file is created
Selected records can be accessed far more quickly from direct access files.
Updating master files with transaction files is made more easy using sequential
files since they are both in the same order to begin with
What would be the best method of access to find one record in a large file -
sequential or direct?
Some file organisations are better than others for particular tasks. These are
some of the reasons why a particular file organisation may be chosen:
The size of the file - In large files sequential searches take a long time, so
direct access is better. In small files time delay is less important, so
sequential access is acceptable.
The type of storage media - Magnetic tape is a serial access medium so direct
access cannot be used.
Relational Databases
Flat files
The earliest data storage computer systems used flat files- A fiat file is like
information stored in a grid or table.
Each row in the table contains a record —information about a particular person
or thing.
Each column in the file contains information on one field, for example Name,
Type, and so on.
Primary key
No two records in a file can be the same or it will lead to confusion. For example,
if two people have the same name, there must be some other means of identifying
which record refers to which person.
Therefore each record has to have a unique identifier - something that makes it
different from all the other records - called the primary key.
• It may not be unique — there are many people with the name Smith
• it may change (for example if the person marries) and you must never alter
the primary key
• it maybe lengthy and so there is more chance of typing it incorrectly
• the database software creates an index of key fields and the shorter the key
field, the smaller the index, so the sorting and searching operations will be
performed faster — a surname can be long
It is a good idea to create a special field to act as the primary key. Sometimes
there is an obvious candidate, for example in a garage keeping a file on cars it
repairs, the registration number would be an ideal primary key.
In flat files data tends to be stored in several places. For example, in a school
information system information about teachers may be stored on the file for
classes, as well as on an administration file holding employment information.
This is very inefficient because repeating data wastes disk space. It could also lead
to inconsistent data if the teacher's address was stored differently in the two files.
We could store the information more efficiently by using a database with two
files. The class table and the teacher table.
This diagram shows the relationship between the teacher and the class. The
diagram shows that a teacher can teach more than one class. It also shows that
each class can only be taught by one teacher.
We can find out the name and address of each teacher from the teacher table
There can be more than one table in a database. A flat file database has only one
table. Each table in a database contains information about an entity, for example
teacher, class, and so on
In a relational database the tables are related. This means the tables are linked
together in some way. For example in the school database, we can create a
relationship between the teacher number field in the teacher table and the class
code field in the class table.
In the class table, class code is the primary key as it uniquely identifies the class.
However it is not unique in the teacher table as one teacher can teach more than
one class. By looking in the class table we can find out the teacher code of the
teacher who takes that class. By looking in the teacher table we can find out more
information about the teacher.
Flat files have been used by computers for many years. They are usually used for
one particular specific purpose. For example, a company might maintain an
employee file used to produce a pay slip. The personnel department might also
have a file of employees' records, which holds some different data.
The file approach is program-oriented; the needs of the program determine how
it is stored. The database approach is data-oriented, the type of data
determines how it is stored. This gives it a number of advantages over a file-based
system.
NB
Types of relationship
• one-to-one
• one-to-many
• many-to-many
Examples
Each product in a supermarket has a bar code. The relationship between product
name and its bar code is one-to-one.
There are many different products, each on sale in many different stores. This is
an example of many-to-many relationship.
Database Normalisation
Database administration
• what tables are included and fields in these tables name and description of
each data item
• the characteristics of each data item, such as its length and data type
• any restrictions on the value of certain fields the relationships between
data items
• control information such as who is allowed access to files
• whether users can change data or only read it
The DBMS is the program that provides an interface between the database and
the user in order to make access to the data as simple as possible, it has several
other functions:
1. Data storage, retrieval and up-date. Users to store, retrieve and up-
date information as easily as possible. These users are not computer
experts and do not need to be aware of the internal structure of the
database or how to set it up.
2. Creation and maintenance of the data dictionary
3. Managing the facilities for sharing the database. Many databases
need a multi-access facility. Two or more people must be able to access a
record simultaneously and to up-date it without a problem.
4. Back-up and recovery. information in the database must not be lost in
the event of system failure
5. Security. The DBMS must check passwords and allow appropriate
privileges.
Database security
As we have already seen data stored in a database is very valuable. Good security
to prevent loss, theft or corruption of data is vital. Relational databases such as
Microsoft Access and Paradox are multi-access databases. This means that on a
network, more than one user may access the same database at the same time,
How can the software cope with two or more users opening the same file, making
alterations and yet maintain the integrity of the database?
Relational databases provide different methods of database security:
1. The simplest method is to set a password for opening the database. Once
set, a password must be entered whenever the database is opened. Only
users who type the correct password will be allowed to open the database.
The password will be encrypted so that it can’t be accessed simply by
reading the database file. Once a database is open, all the features are
available to the user. For a database on a stand alone computer, setting a
password is normally sufficient protection.
2. A more flexible method of database security is called user-level security,
which is similar to the sort of security found on networks. Users must type
a username and password when they load the DBMS. The database
administrator will allocate users to a group. Different groups will be
allowed to see different parts of the database.
Why should a video shop use a database? Could a manual system be better than a
database system? Both systems record details of members and who has hired
which video. (This could easily be done in a paper-based system by using a list of
all videos and the name of the hirer written next to it.)
A manual system is cheaper, unlikely to break down and requires little training.
However the computerised system will probably be better for the following
reasons:
Better service to customers. Using a bar code reader to enter the video code and
the member code is very quick. Queues at the counter will be shorter.
The names and addresses of members can be used for advertising purposes in a
direct-mail shot. The database can be queried to come up with a list of
people who haven’t hired a video for six months and a letter written offering
them a discount if they hire a film this week. The letter can be personalised
using the mail-merge from a word-processing package.
Similarly automatic reminders can be sent out to members who have not returned
a video by the due date.
The database can be extended to include the member’s date of birth. The
computer can be used to ensure that a member is old enough to hire, say, an
18 video.