File Format: 2 Patents
File Format: 2 Patents
2 Patents
A le format is a standard way that information is encoded for storage in a computer le. It species how
bits are used to encode information in a digital storage
medium. File formats may be either proprietary or free
and may be either unpublished or open.
Some le formats are designed for very particular types
of data: PNG les, for example, store bitmapped images
using lossless data compression. Other le formats, however, are designed for storage of several dierent types of
data: the Ogg format can act as a container for dierent
types of multimedia including any combination of audio
and video, with or without text (such as subtitles), and
metadata. A text le can contain any stream of characters, including possible control characters, and is encoded
in one of various character encoding schemes. Some le
formats, such as HTML, scalable vector graphics, and the
source code of computer software are text les with dened syntaxes that allow them to be used for specic purposes.
3 Identifying le type
Dierent operating systems have traditionally taken different approaches to determining a particular les format, with each approach having its own advantages and
disadvantages. Most modern operating systems and individual applications need to use all of the following approaches to read foreign le formats, if not work with
them completely.
Specications
One popular method used by many operating systems, including Windows, Mac OS X, CP/M, DOS, VMS, and
VM/CMS, is to determine the format of a le based on
the end of its namethe letters following the nal period.
This portion of the lename is known as the lename extension. For example, HTML documents are identied
by names that end with .html (or .htm), and GIF images by .gif. In the original FAT lesystem, le names
were limited to an eight-character identier and a threecharacter extension, known as an 8.3 lename. There
are only so many three-letter extensions, so, often any
given extension might be linked to more than one program. Many formats still use three-character extensions
even though modern operating systems and application
programs no longer have this limitation. Since there is no
standard list of extensions, more than one format can use
the same extension, which can confuse both the operating
system and users.
If the developer of a format doesn't publish free specications, another developer looking to utilize that kind of
le must either reverse engineer the le to nd out how to
read it or acquire the specication document from the formats developers for a fee and by signing a non-disclosure
agreement. The latter approach is possible only when a
formal specication document exists. Both strategies require signicant time, money, or both; therefore, le formats with publicly available specications tend to be supported by more programs.
1
3.2
Internal metadata
A second way to identify a le format is to use information regarding the format stored inside the le itself, either One way to incorporate le type metadata, often associinformation meant for this purpose or binary strings that ated with Unix and its derivatives, is just to store a magic
3.3
External metadata
3.3.2
attributes can still be read and written by Win32 programs, but the data must be entirely parsed by applications.
A Uniform Type Identier (UTI) is a method used in Mac 3.3.4 POSIX extended attributes
OS X for uniquely identifying typed classes of entity,
such as le formats. It was developed by Apple as a re- On Unix and Unix-like systems, the ext2, ext3, ReiserFS
placement for OSType (type & creator codes).
version 3, XFS, JFS, FFS, and HFS+ lesystems allow the
The UTI is a Core Foundation string, which uses a storage of extended attributes with les. These include an
reverse-DNS string. Some common and standard types arbitrary list of name=value strings, where the names
use a domain called public (e.g. public.png for a Portable are unique and a value can be accessed through its related
Network Graphics image), while other domains can name.
be used for third-party types (e.g. com.adobe.pdf for
Portable Document Format). UTIs can be dened within
a hierarchical structure, known as a conformance hierar- 3.3.5 PRONOM unique identiers (PUIDs)
chy. Thus, public.png conforms to a supertype of public.image, which itself conforms to a supertype of pub- The PRONOM Persistent Unique Identier (PUID) is
lic.data. A UTI can exist in multiple hierarchies, which an extensible scheme of persistent, unique and unambiguous identiers for le formats, which has been deprovides great exibility.
veloped by The National Archives of the UK as part
In addition to le formats, UTIs can also be used for other of its PRONOM technical registry service. PUIDs can
entities which can exist in OS X, including:
be expressed as Uniform Resource Identiers using the
info:pronom/ namespace. Although not yet widely used
outside of UK government and some digital preserva Pasteboard data
tion programmes, the PUID scheme does provide greater
Folders (directories)
granularity than most alternative schemes.
Translatable types (as handled by the Translation
Manager)
3.3.6 MIME types
Bundles
4.2
Chunk-based formats
5
The containers scope can be identied by start- and endmarkers of some kind, by an explicit length eld somewhere, or by xed requirements of the le formats denition.
Throughout the 1970s, many programs used formats of
this general kind. For example, word-processors such as
tro, Script, and Scribe, and database export les such as
CSV. Electronic Arts and Commodore-Amiga also used
this type of le format in 1985, with their IFF (Interchange File Format) le format.
4.2
Chunk-based formats
In this kind of le structure, each piece of data is embedded in a container that somehow identies the data.
7
SGML and its predecessor IBM GML are among
the earliest examples of such formats.
JSON is similar to XML without schemas, crossreferences, or a denition for the meaning of repeated eld-names, and is often convenient for programmers.
Protocol buers are in turn similar to JSON, notably replacing boundary-markers in the data with
eld numbers, which are mapped to/from names by
some external mechanism.
4.3
Directory-based formats
See also
Audio le format
Chemical le format
Digital container format
Document le format
DROID le format identication utility
File (command), a le type identication utility
File Formats, Transformation, and Migration (related wikiversity article)
File conversion
Future proong
Graphics le format summary
Image le formats
List of archive formats
List of le formats
List of lename extensions (alphabetical)
List of free le formats
List of motion and gesture le formats
Magic number (programming)
List of le signatures, or magic numbers
Object le
Video le format
Windows le types
EXTERNAL LINKS
6 References
[1] PC World (23 December 2003). Windows Tips: For Security Reasons, It Pays To Know Your File Extensions.
Retrieved 20 June 2008.
[2] File Format Identication.
7 External links
Open Directory Data Format links - File types resources on DMOZ
Best Practices for File Formats, US: Stanford University Libraries, Data Management Services (The
le formats you use have a direct impact on your
ability to open those les at a later date and on the
ability of other people to access those data)
8.1
Text
File format Source: https://en.wikipedia.org/wiki/File_format?oldid=756476452 Contributors: Damian Yerrick, Tuxisuau, Brion VIBBER, Bryan Derksen, Zundark, The Anome, Wayne Hardman, Karl E. V. Palmen, PierreAbbat, Ryguasu, Hirzel, B4hand, Lightning~enwiki, Bob Jonkman, Twilsonb, Patrick, RTC, Nixdorf, Psi~enwiki, Tannin, Ahoerstemeier, Stan Shebs, Mac, J-Wiki, Error, IMSoP, Ww, Dysprosia, Markhurd, Furrykef, Joy, AnonMoos, Ldo, Phil Boswell, Robbot, Chealer, RedWolf, Geo97, Yacht, Wikibot,
Wereon, Tea2min, Art Carlson, Hagedis, Dissident, Niteowlneils, Mboverload, AlistairMcMillan, Vadmium, Beland, OverlordQ, Marc
Mongenet, Rich Farmbrough, Smyth, Kop, Kiand, Sajt, Nigelj, Stesmo, R. S. Shaw, Ranma~enwiki, Jrme, Guy Harris, CyberSkull,
Atlant, Mikenolte, Mrtngslr, Kbolino, Simetrical, Reinoutr, Asteron, Uncle G, Jacobolus, Phillipsacp, NeoChaosX, Gengiskanhg, SDC,
Joerg Kurt Wegner, Marudubshinki, Graham87, Raaele Megabyte, Drrngrvy, Ysangkok, Spudtater, RexNL, Sderose, KirtWalker, DaGizza, ColdFeet, Wavelength, Phantomsteve, Taejo, SpuriousQ, IByte, Yuhong, Gaius Cornelius, Dureo, JulesH, Putz~enwiki, Bota47,
Elkman, CWenger, Ian Fieggen, David Biddulph, Allens, Mhkay, SmackBot, Incnis Mrsi, Adam Mirowski, Shoy, Unyoyega, Gilliam,
Thumperward, Oli Filth, Nbarth, Colonies Chris, Can't sleep, clown will eat me, Abmac, Dreadstar, Warren, Mwtoews, Henning Makholm,
Bogsat, Jna runn, Xandi, Fsuarez2005, WalterMB, 16@r, Loadmaster, EdC~enwiki, Hu12, Shoeofdeath, Robust Physique, Simeon,
RageRiot, Enoch the red, Kozuch, PamD, Thijs!bot, Epbr123, Hasan.Z, Bobblehead, Davidhorman, Philippe, Escarbot, Mentisto, AntiVandalBot, Gioto, OMD, MikeLynch, JAnDbot, MER-C, Malpertuis, Psicorps, Tedickey, Ccodere, Nposs, TheDwoo, Wikianon, Aluvia,
Ineable3000, Conquerist, MartinBot, STBot, Jim.henderson, Speck-Made, Trusilver, Theo Mark, Petrwiki, Nemo bis, AhmadSherif,
Vinamra2004, Bushcarrot, Cometstyles, Joeinwap, Boijunk, AlnoktaBOT, Matthieu.evrard, Orie0505, Canaima, Jackfork, Wiae, Pious7,
AJRobbins, Phreaka Dude, Arslansheikh, SieBot, Zuurman, Smsarmad, Nickols k, Grndrush, Allmightyduck, Jdaloner, Averagebloke,
Ashenfelder, Withouttrace, The Stickler, StaticGull, ClueBot, GorillaWarfare, The Thing That Should Not Be, Jewers, Enthusiast01, DragonBot, MorrisRob, TobiasPersson, ANOMALY-117, SchreiberBike, La Pianista, Inspector 34, Thingg, Fred4816, MelonBot, InternetMeme, XLinkBot, DotWhat, SilvonenBot, RP459, Addbot, Ghettoblaster, LP, Tanhabot, Cst17, CarsracBot, AnnaFrance, Tassedethe,
Jarble, Yobot, AnomieBOT, Piano non troppo, Padeas, Extensionpedia, A.amitkumar, M2545, Michelin106, HRoestBot, Jonesey95,
Hoo man, Barras, Ahm irf, Vrenator, Lewisluo, Christopherhalliwell, Mean as custard, Dewritech, K6ka, Ronk01, ClueBot NG, SpikeTorontoRCP, Gunnerjack14, Oddbodz, BG19bot, Wiki13, Tom Pippens, HMman, Alikjinda, Hopeoight, BattyBot, ,
Prof. Squirrel, Tagremover, YFdyh-bot, Khazar2, Junior5a, Hmainsbot1, Mogism, Isarra (HG), Bumblebritches57, Condorcraft110, Jemee012, Babitaarora, Wilster.clark, Ugog Nizdast, Ginsuloft, Spacenut42, Neosamardzic1223434343000, Csusarah, Baybaym, Sandradiaz016, CamelCase, Juan j funes, KH-1, Innite0694, Supdiop, Dr Liton, Ugultopu, Reidgreg, Mxbu41, Justeditingtoday and Anonymous:
275
8.2
Images
File:Ambox_important.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/b4/Ambox_important.svg License: Public domain Contributors: Own work, based o of Image:Ambox scales.svg Original artist: Dsmurat (talk contribs)
File:Question_book-new.svg Source: https://upload.wikimedia.org/wikipedia/en/9/99/Question_book-new.svg License: Cc-by-sa-3.0
Contributors:
Created from scratch in Adobe Illustrator. Based on Image:Question book.png created by User:Equazcion Original artist:
Tkgd2007
File:Text_document_with_red_question_mark.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_
with_red_question_mark.svg License: Public domain Contributors: Created by bdesham with Inkscape; based upon Text-x-generic.svg
from the Tango project. Original artist: Benjamin D. Esham (bdesham)
File:Wiktionary-logo-v2.svg Source: https://upload.wikimedia.org/wikipedia/commons/0/06/Wiktionary-logo-v2.svg License: CC BYSA 4.0 Contributors: Own work Original artist: Dan Polansky based on work currently attributed to Wikimedia Foundation but originally
created by Smurrayinchester
8.3
Content license