0% found this document useful (0 votes)
85 views9 pages

Data Modeling 101

Data modeling 101

Uploaded by

Navi Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views9 pages

Data Modeling 101

Data modeling 101

Uploaded by

Navi Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

4/20/2017 DataModeling101

DataModeling101

Home Roles Practices Roadmaps Resources #AgileDB Contactus

Search
Thegoalsofthisarticlearetooverviewfundamentaldatamodelingskillsthatalldevelopersshouldhave,skillsthatcanbeappliedonbothtraditionalprojectsthattakeaserialapproach
toagileprojectsthattakeanevolutionaryapproach.MypersonalphilosophyisthateveryITprofessionalshouldhaveabasicunderstandingofdatamodeling.Theydontneedtobe
expertsatdatamodeling,buttheyshouldbepreparedtobeinvolvedinthecreationofsuchamodel,beabletoreadanexistingdatamodel,understandwhenandwhennottocreatea
datamodel,andappreciatefundamentaldatadesigntechniques.Thisarticleisabriefintroductiontotheseskills.Theprimaryaudienceforthisarticleisapplicationdeveloperswhoneed
togainanunderstandingofsomeofthecriticalactivitiesperformedbyanAgileDBA.ThisunderstandingshouldleadtoanappreciationofwhatAgileDBAsdoandwhytheydothem,
anditshouldhelptobridgethecommunicationgapbetweenthesetworoles.

TableofContents
1.Whatisdatamodeling?
Howaredatamodelsusedinpractice?
Whataboutconceptualmodels?
Commondatamodelingnotations
2.Howtomodeldata
Identifyentitytypes
Identifyattributes
Applynamingconventions
Identifyrelationships
Applydatamodelpatterns
Assignkeys
Normalizetoreducedataredundancy
Denormalizetoimproveperformance
3.Evolutionary/agiledatamodeling
4.Howtobecomebetteratmodelingdata

1.WhatisDataModeling?
Datamodelingistheactofexploringdataorientedstructures.Likeothermodelingartifactsdatamodelscanbeusedforavarietyofpurposes,fromhighlevelconceptualmodelsto
physicaldatamodels.Fromthepointofviewofanobjectorienteddeveloperdatamodelingisconceptuallysimilartoclassmodeling.Withdatamodelingyouidentifyentitytypes
whereaswithclassmodelingyouidentifyclasses.Dataattributesareassignedtoentitytypesjustasyouwouldassignattributesandoperationstoclasses.Thereareassociations
betweenentities,similartotheassociationsbetweenclassesrelationships,inheritance,composition,andaggregationareallapplicableconceptsindatamodeling.

Traditionaldatamodelingisdifferentfromclassmodelingbecauseitfocusessolelyondataclassmodelsallowyoutoexploreboththebehavioranddataaspectsofyourdomain,with
adatamodelyoucanonlyexploredataissues.Becauseofthisfocusdatamodelershaveatendencytobemuchbetteratgettingthedataright"thanobjectmodelers.However,some
peoplewillmodeldatabasemethods(storedprocedures,storedfunctions,andtriggers)whentheyarephysicaldatamodeling.Itdependsonthesituationofcourse,butIpersonallythink
thatthisisagoodideaandpromotetheconceptinmyUMLdatamodelingprofile(moreonthislater).

Althoughthefocusofthisarticleisdatamodeling,thereareoftenalternativestodataorientedartifacts(neverforgetAgileModelingsMultipleModelsprinciple).Forexample,whenit
comestoconceptualmodelingORMdiagramsarentyouronlyoptionInadditiontoLDMsitisquitecommonforpeopletocreateUMLclassdiagramsandevenClassResponsibility
Collaborator(CRC)cardsinstead.Infact,myexperienceisthatCRCcardsaresuperiortoORMdiagramsbecauseitisveryeasytogetprojectstakeholdersactivelyinvolvedinthe
creationofthemodel.Insteadofatraditional,analystleddrawingsessionyoucaninsteadfacilitatestakeholdersthroughthecreationofCRCcards.

1.1HowareDataModelsUsedinPractice?
Althoughmethodologyissuesarecoveredlater,weneedtodiscusshowdatamodelscanbeusedinpracticetobetterunderstandthem.Youarelikelytoseethreebasicstylesofdata
model:

Conceptualdatamodels.Thesemodels,sometimescalleddomainmodels,aretypicallyusedtoexploredomainconceptswithprojectstakeholders.On
Agileteamshighlevelconceptualmodelsareoftencreatedaspartofyourinitialrequirementsenvisioningeffortsastheyareusedtoexplorethehighlevel
staticbusinessstructuresandconcepts.OntraditionalteamsconceptualdatamodelsareoftencreatedastheprecursortoLDMsorasalternativestoLDMs.
Logicaldatamodels(LDMs).LDMsareusedtoexplorethedomainconcepts,andtheirrelationships,ofyourproblemdomain.Thiscouldbedoneforthe
scopeofasingleprojectorforyourentireenterprise.LDMsdepictthelogicalentitytypes,typicallyreferredtosimplyasentitytypes,thedataattributes
describingthoseentities,andtherelationshipsbetweentheentities.LDMsarerarelyusedonAgileprojectsalthoughoftenareontraditionalprojects(where
theyrarelyseemtoaddmuchvalueinpractice).
Physicaldatamodels(PDMs).PDMsareusedtodesigntheinternalschemaofadatabase,depictingthedatatables,thedatacolumnsofthosetables,and
therelationshipsbetweenthetables.PDMsoftenprovetobeusefulonbothAgileandtraditionalprojectsandasaresultthefocusofthisarticleison
physicalmodeling.

AlthoughLDMsandPDMssoundverysimilar,andtheyinfactare,thelevelofdetailthattheymodelcanbesignificantlydifferent.Thisisbecausethegoalsforeachdiagramisdifferent
youcanuseanLDMtoexploredomainconceptswithyourstakeholdersandthePDMtodefineyourdatabasedesign.Figure1presentsasimpleLDMandFigure2asimplePDM,both
modelingtheconceptofcustomersandaddressesaswellastherelationshipbetweenthem.BothdiagramsapplytheBarkernotation,summarizedbelow.NoticehowthePDMshows
greaterdetail,includinganassociativetablerequiredtoimplementtheassociationaswellasthekeysneededtomaintaintherelationships.Moreontheseconceptslater.PDMsshould
alsoreflectyourorganizationsdatabasenamingstandards,inthiscaseanabbreviationoftheentitynameisappendedtoeachcolumnnameandanabbreviationforNumber"was
consistentlyintroduced.APDMshouldalsoindicatethedatatypesforthecolumns,suchasintegerandchar(5).AlthoughFigure2doesnotshowthem,lookuptables(alsocalled
referencetablesordescriptiontables)forhowtheaddressisusedaswellasforstatesandcountriesareimpliedbytheattributesADDR_USAGE_CODE,STATE_CODE,and
COUNTRY_CODE.

Figure1.Asimplelogicaldatamodel.

http://www.agiledata.org/essays/dataModeling101.html 1/9
4/20/2017 DataModeling101

Figure2.Asimplephysicaldatamodel.

AnimportantobservationaboutFigures1and2isthatImnotslavishlyfollowingBarkersapproachtonamingrelationships.Forexample,betweenCustomerandAddresstherereally
shouldbetwonamesEachCUSTOMERmaybelocatedinoneormoreADDRESSES"andEachADDRESSmaybethesiteofoneormoreCUSTOMERS".Althoughthesenames
explicitlydefinetherelationshipIpersonallythinkthattheyrevisualnoisethatclutterthediagram.Iprefersimplenamessuchashas"andthentrustmyreaderstointerpretthenamein
eachdirection.Illonlyaddmoreinformationwhereitsneeded,inthiscaseIthinkthatitisnt.However,asignificantadvantageofdescribingthenamesthewaythatBarkersuggestsis
thatitsagoodtesttoseeifyouactuallyunderstandtherelationshipifyoucantnameitthenyoulikelydontunderstandit.

Datamodelscanbeusedeffectivelyatboththeenterpriselevelandonprojects.EnterprisearchitectswilloftencreateoneormorehighlevelLDMsthatdepictthedatastructuresthat
supportyourenterprise,modelstypicallyreferredtoasenterprisedatamodelsorenterpriseinformationmodels.Anenterprisedatamodelisoneofseveralviewsthatyourorganizations
enterprisearchitectsmaychoosetomaintainandsupportotherviewsmayexploreyournetwork/hardwareinfrastructure,yourorganizationstructure,yoursoftwareinfrastructure,and
yourbusinessprocesses(tonameafew).Enterprisedatamodelsprovideinformationthataprojectteamcanusebothasasetofconstraintsaswellasimportantinsightsintothe
structureoftheirsystem.

ProjectteamswilltypicallycreateLDMsasaprimaryanalysisartifactwhentheirimplementationenvironmentispredominantlyproceduralinnature,forexampletheyareusingstructured
COBOLasanimplementationlanguage.LDMsarealsoagoodchoicewhenaprojectisdataorientedinnature,perhapsadatawarehouseorreportingsystemisbeingdeveloped(having
saidthat,experienceseemstoshowthatusagecenteredapproachesappeartoworkevenbetter).HoweverLDMsareoftenapoorchoicewhenaprojectteamisusingobjectorientedor
componentbasedtechnologiesbecausethedeveloperswouldratherworkwithUMLdiagramsorwhentheprojectisnotdataorientedinnature.AsAgileModelingadvises,applytheright
artifact(s)forthejob.Or,asyourgrandfatherlikelyadvisedyou,usetherighttoolforthejob.It'simportanttonotethattraditionalapproachestoMasterDataManagement(MDM)will
oftenmotivatethecreationandmaintenanceofdetailedLDMs,aneffortthatisrarelyjustifiableinpracticewhenyouconsiderthetotalcostofownership(TCO)whencalculatingthe
returnoninvestment(ROI)ofthosesortsofefforts.

WhenarelationaldatabaseisusedfordatastorageprojectteamsarebestadvisedtocreateaPDMstomodelitsinternalschema.MyexperienceisthataPDMisoftenoneofthe
criticaldesignartifactsforbusinessapplicationdevelopmentprojects.

2.2.WhatAboutConceptualModels?
Halpin(2001)pointsoutthatmanydataprofessionalsprefertocreateanObjectRoleModel(ORM),anexampleisdepictedinFigure3,insteadofanLDMforaconceptualmodel.The
advantageisthatthenotationisverysimple,somethingyourprojectstakeholderscanquicklygrasp,althoughthedisadvantageisthatthemodelsbecomelargeveryquickly.ORMs
enableyoutofirstexploreactualdataexamplesinsteadofsimplyjumpingtoapotentiallyincorrectabstractionforexampleFigure3examinestherelationshipbetweencustomersand
addressesindetail.FormoreinformationaboutORM,visitwww.orm.net.

Figure3.AsimpleObjectRoleModel.

Myexperienceisthatpeoplewillcaptureinformationinthebestplacethattheyknow.AsaresultItypicallydiscardORMsafterImfinishedwiththem.Isometimes
userORMstoexplorethedomainwithprojectstakeholdersbutlaterreplacethemwithamoretraditionalartifactsuchasanLDM,aclassdiagram,orevena
PDM.Asageneralizingspecialist,someonewithoneormorespecialtieswhoalsostrivestogaingeneralskillsandknowledge,thisisaneasydecisionformeto
makeIknowthatthisinformationthatIvejustdiscarded"willbecapturedinanotherartifactamodel,thetests,oreventhecodethatIunderstand.A
specialistwhoonlyunderstandsalimitednumberofartifactsandthereforehandsoff"theirworktootherspecialistsdoesnthavethisasanoption.Notonlyarethey
temptedtokeeptheartifactsthattheycreatebutalsotoinvestevenmoretimetoenhancetheartifacts.Generalizingspecialistsaremorelikelythanspecialiststo
travellight.

2.3.CommonDataModelingNotations
Figure4presentsasummaryofthesyntaxoffourcommondatamodelingnotations:InformationEngineering(IE),Barker,IDEF1X,andtheUnifiedModelingLanguage(UML).This
diagramisntmeanttobecomprehensive,insteaditsgoalistoprovideabasicoverview.Furthermore,forthesakeofbrevityIwasntabletodepictthehighlydetailedapproachto
relationshipnamingthatBarkersuggests.AlthoughIprovideabriefdescriptionofeachnotationinTable1IhighlysuggestDavidHayspaperAComparisonofDataModelingTechniques
ashegoesintogreaterdetailthanIdo.

http://www.agiledata.org/essays/dataModeling101.html 2/9
4/20/2017 DataModeling101
Figure4.Comparingthesyntaxofcommondatamodelingnotations.

Table1.Discussingcommondatamodelingnotations.

Notation Comments

TheIEnotation(Finkelstein1989)issimpleandeasytoread,andiswellsuitedforhighlevellogicalandenterprisedatamodeling.Theonlydrawbackofthisnotation,arguably
IE anadvantage,isthatitdoesnotsupporttheidentificationofattributesofanentity.Theassumptionisthattheattributeswillbemodeledwithanotherdiagramorsimply
describedinthesupportingdocumentation.

TheBarkernotationisoneofthemorepopularones,itissupportedbyOraclestoolset,andiswellsuitedforalltypesofdatamodels.Itsapproachtosubtypingcanbecome
Barker
clunkywithhierarchiesthatgoseverallevelsdeep.

Thisnotationisoverlycomplex.Itwasoriginallyintendedforphysicalmodelingbuthasbeenmisappliedforlogicalmodelingaswell.AlthoughpopularwithinsomeU.S.
IDEF1X
governmentagencies,particularlytheDepartmentofDefense(DoD),thisnotationhasbeenallbutabandonedbyeveryoneelse.Avoiditifyoucan.

Thisisnotanofficialdatamodelingnotation(yet).AlthoughseveralsuggestionsforadatamodelingprofilefortheUMLexist,nonearecompleteandmoreimportantlyarenot
UML
official"UMLyet.However,theObjectManagementGroup(OMG)inDecember2005announcedanRFPfordataorientedmodels.


http://www.agiledata.org/essays/dataModeling101.html 3/9
4/20/2017 DataModeling101
3.HowtoModelData
ItiscriticalforanapplicationdevelopertohaveagraspofthefundamentalsofdatamodelingsotheycannotonlyreaddatamodelsbutalsoworkeffectivelywithAgileDBAswhoare
responsibleforthedataorientedaspectsofyourproject.Yourgoalreadingthissectionisnottolearnhowtobecomeadatamodeler,insteaditissimplytogainanappreciationofwhatis
involved.

Thefollowingtasksareperformedinaniterativemanner:

Identifyentitytypes
Identifyattributes
Applynamingconventions
Identifyrelationships
Applydatamodelpatterns
Assignkeys
Normalizetoreducedataredundancy
Denormalizetoimproveperformance

VerygoodpracticalbooksaboutdatamodelingincludeJoeCelkosData&DatabasesandDataModeling
forInformationProfessionalsastheybothfocusonpracticalissueswithdatamodeling.TheData
ModelingHandbookandDataModelPatternsarebothexcellentresourcesonceyouvemasteredthe
fundamentals.AnIntroductiontoDatabaseSystemsisagoodacademictreatiseforanyonewishingto
becomeadataspecialist.

3.1IdentifyEntityTypes

Anentitytype,alsosimplycalledentity(notexactlyaccurateterminology,butverycommoninpractice),issimilarconceptuallytoobjectorientationsconceptofaclassanentitytype
representsacollectionofsimilarobjects.Anentitytypecouldrepresentacollectionofpeople,places,things,events,orconcepts.Examplesofentitiesinanorderentrysystemwould
includeCustomer,Address,Order,Item,andTax.Ifyouwereclassmodelingyouwouldexpecttodiscoverclasseswiththeexactsamenames.However,thedifferencebetweenaclass
andanentitytypeisthatclasseshavebothdataandbehaviorwhereasentitytypesjusthavedata.

Ideallyanentityshouldbenormal,thedatamodelingworldsversionofcohesive.Anormalentitydepictsoneconcept,justlikeacohesiveclassmodelsoneconcept.Forexample,
customerandorderareclearlytwodifferentconceptsthereforeitmakessensetomodelthemasseparateentities.

3.2IdentifyAttributes
Eachentitytypewillhaveoneormoredataattributes.Forexample,inFigure1yousawthattheCustomerentityhasattributessuchasFirstNameandSurnameandinFigure2thatthe
TCUSTOMERtablehadcorrespondingdatacolumnsCUST_FIRST_NAMEandCUST_SURNAME(acolumnistheimplementationofadataattributewithinarelationaldatabase).

Attributesshouldalsobecohesivefromthepointofviewofyourdomain,somethingthatisoftenajudgmentcall.inFigure1wedecidedthatwewantedtomodelthefactthatpeople
hadbothfirstandlastnamesinsteadofjustaname(e.g.Scott"andAmbler"vs.ScottAmbler")whereaswedidnotdistinguishbetweenthesectionsofanAmericanzipcode(e.g.
9021012345678).Gettingthelevelofdetailrightcanhaveasignificantimpactonyourdevelopmentandmaintenanceefforts.Refactoringasingledatacolumnintoseveralcolumnscan
bedifficult,databaserefactoringisdescribedindetailinDatabaseRefactoring,althoughoverspecifyinganattribute(e.g.havingthreeattributesforzipcodewhenyouonlyneededone)
canresultinoverbuildingyoursystemandhenceyouincurgreaterdevelopmentandmaintenancecoststhanyouactuallyneeded.

3.3ApplyDataNamingConventions
Yourorganizationshouldhavestandardsandguidelinesapplicabletodatamodeling,somethingyoushouldbeabletoobtainfromyourenterpriseadministrators(iftheydontexistyou
shouldlobbytohavesomeputinplace).Theseguidelinesshouldincludenamingconventionsforbothlogicalandphysicalmodeling,thelogicalnamingconventionsshouldbefocused
onhumanreadabilitywhereasthephysicalnamingconventionswillreflecttechnicalconsiderations.YoucanclearlyseethatdifferentnamingconventionswereappliedinFigures1and
2.

AsyousawinIntroductiontoAgileModeling,AMincludestheApplyModelingStandardspractice.Thebasicideaisthatdevelopersshouldagreetoandfollowacommonsetofmodeling
standardsonasoftwareproject.Justlikethereisvalueinfollowingcommoncodingconventions,cleancodethatfollowsyourchosencodingguidelinesiseasiertounderstandand
evolvethancodethatdoesn't,thereissimilarvalueinfollowingcommonmodelingconventions.

3.4IdentifyRelationships
Intherealworldentitieshaverelationshipswithotherentities.Forexample,customersPLACEorders,customersLIVEATaddresses,andlineitemsAREPARTOForders.Place,live
at,andarepartofarealltermsthatdefinerelationshipsbetweenentities.Therelationshipsbetweenentitiesareconceptuallyidenticaltotherelationships(associations)betweenobjects.

Figure5depictsapartialLDMforanonlineorderingsystem.Thefirstthingtonoticeisthevariousstylesappliedtorelationshipnamesandrolesdifferentrelationshipsrequiredifferent
approaches.ForexampletherelationshipbetweenCustomerandOrderhastwonames,placesandisplacedby,whereastherelationshipbetweenCustomerandAddresshasone.In
thisexamplehavingasecondnameontherelationship,theideabeingthatyouwanttospecifyhowtoreadtherelationshipineachdirection,isredundantyourebetterofftofinda
clearwordingforasinglerelationshipname,decreasingtheclutteronyourdiagram.Similarlyyouwilloftenfindthatbyspecifyingtherolesthatanentityplaysinarelationshipwilloften
negatetheneedtogivetherelationshipaname(althoughsomeCASEtoolsmayinadvertentlyforceyoutodothis).Forexampletheroleofbillingaddressandthelabelbilledtoare
clearlyredundant,youreallyonlyneedone.ForexampletherolepartofthatLineItemhasinitsrelationshipwithOrderissufficientlyobviouswithoutarelationshipname.

Figure5.Alogicaldatamodel(InformationEngineeringnotation).

http://www.agiledata.org/essays/dataModeling101.html 4/9
4/20/2017 DataModeling101

Youalsoneedtoidentifythecardinalityandoptionalityofarelationship(theUMLcombinestheconceptsofoptionalityandcardinalityintothesingleconceptofmultiplicity).Cardinality
representstheconceptofhowmany"whereasoptionalityrepresentstheconceptofwhetheryoumusthavesomething."Forexample,itisnotenoughtoknowthatcustomersplace
orders.Howmanyorderscanacustomerplace?None,one,orseveral?Furthermore,relationshipsaretwowaystreets:notonlydocustomersplaceorders,butordersareplacedby
customers.Thisleadstoquestionslike:howmanycustomerscanbeenrolledinanygivenorderandisitpossibletohaveanorderwithnocustomerinvolved?Figure5showsthat
customersplacezeroormoreordersandthatanygivenorderisplacedbyonecustomerandonecustomeronly.Italsoshowsthatacustomerlivesatoneormoreaddressesandthat
anygivenaddresshaszeroormorecustomerslivingatit.

AlthoughtheUMLdistinguishesbetweendifferenttypesofrelationshipsassociations,inheritance,aggregation,composition,anddependencydatamodelersoftenarentasconcerned
withthisissueasmuchasobjectmodelersare.Subtyping,oneapplicationofinheritance,isoftenfoundindatamodels,anexampleofwhichistheisarelationshipbetweenItemandits
twosubentities"ServiceandProduct.Aggregationandcompositionaremuchlesscommonandtypicallymustbeimpliedfromthedatamodel,asyouseewiththepartofrolethatLine
ItemtakeswithOrder.UMLdependenciesaretypicallyasoftwareconstructandthereforewouldntappearonadatamodel,unlessofcourseitwasaveryhighlydetailedphysicalmodel
thatshowedhowviews,triggers,orstoredproceduresdependedonotheraspectsofthedatabaseschema.

3.5ApplyDataModelPatterns

Somedatamodelerswillapplycommondatamodelpatterns,DavidHaysbookDataModelPatternsisthebestreferenceonthesubject,justasobjectoriented
developerswillapplyanalysispatterns(Fowler1997Ambler1997)anddesignpatterns(Gammaetal.1995).Datamodelpatternsareconceptuallyclosestto
analysispatternsbecausetheydescribesolutionstocommondomainissues.Haysbookisaverygoodreferenceforanyoneinvolvedinanalysislevelmodeling,
evenwhenyouretakinganobjectapproachinsteadofadataapproachbecausehispatternsmodelbusinessstructuresfromawidevarietyofbusinessdomains.

3.6AssignKeys
Therearetwofundamentalstrategiesforassigningkeystotables.First,youcouldassignanaturalkeywhichisoneormoreexistingdataattributesthatareuniquetothebusiness
concept.TheCustomertableofFigure6therewastwocandidatekeys,inthiscaseCustomerNumberandSocialSecurityNumber.Second,youcouldintroduceanewcolumn,calleda
surrogatekey,whichisakeythathasnobusinessmeaning.AnexampleofwhichistheAddressIDcolumnoftheAddresstableinFigure6.Addressesdonthaveaneasy"naturalkey
becauseyouwouldneedtouseallofthecolumnsoftheAddresstabletoformakeyforitself(youmightbeabletogetawaywithjustthecombinationofStreetandZipCodedepending
onyourproblemdomain),thereforeintroducingasurrogatekeyisamuchbetteroptioninthiscase.

Figure6.CustomerandAddressrevisited(UMLnotation).

Let'sconsiderFigure6inmoredetail.Figure6presentsanalternativedesigntothatpresentedinFigure2,adifferentnamingconventionwasadoptedandthemodelitselfismore
extensive.InFigure6theCustomertablehastheCustomerNumbercolumnasitsprimarykeyandSocialSecurityNumberasanalternatekey.Thisindicatesthatthepreferredwayto
accesscustomerinformationisthroughthevalueofapersonscustomernumberalthoughyoursoftwarecangetatthesameinformationifithasthepersonssocialsecuritynumber.
TheCustomerHasAddresstablehasacompositeprimarykey,thecombinationofCustomerNumberandAddressID.Aforeignkeyisoneormoreattributesinanentitytypethat
representsakey,eitherprimaryorsecondary,inanotherentitytype.Foreignkeysareusedtomaintainrelationshipsbetweenrows.Forexample,therelationshipsbetweenrowsinthe
CustomerHasAddresstableandtheCustomertableismaintainedbytheCustomerNumbercolumnwithintheCustomerHasAddresstable.Theinterestingthingaboutthe
CustomerNumbercolumnisthefactthatitispartoftheprimarykeyforCustomerHasAddressaswellastheforeignkeytotheCustomertable.Similarly,theAddressIDcolumnispartof
theprimarykeyofCustomerHasAddressaswellasaforeignkeytotheAddresstabletomaintaintherelationshipwithrowsofAddress.

Althoughthe"naturalvs.surrogate"debateisoneofthegreatreligiousissueswithinthedatacommunity,thefactisthatneitherstrategyisperfectandyou'lldiscoverthatinpractice(as
weseeinFigure6)sometimesitmakessensetousenaturalkeysandsometimesitmakessensetousesurrogatekeys.InChoosingaPrimaryKey:NaturalorSurrogate?Idescribe
therelevantissuesindetail.

3.7NormalizetoReduceDataRedundancy

Datanormalizationisaprocessinwhichdataattributeswithinadatamodelareorganizedtoincreasethecohesionofentitytypes.Inotherwords,thegoalofdata
normalizationistoreduceandeveneliminatedataredundancy,animportantconsiderationforapplicationdevelopersbecauseitisincrediblydifficulttostores
objectsinarelationaldatabasethatmaintainsthesameinformationinseveralplaces.Table2summarizesthethreemostcommonnormalizationrulesdescribing
howtoputentitytypesintoaseriesofincreasinglevelsofnormalization.Higherlevelsofdatanormalization(Date2000)arebeyondthescopeofthisbook.With
respecttoterminology,adataschemaisconsideredtobeatthelevelofnormalizationofitsleastnormalizedentitytype.Forexample,ifallofyourentitytypesare
atsecondnormalform(2NF)orhigherthenwesaythatyourdataschemaisat2NF.

http://www.agiledata.org/essays/dataModeling101.html 5/9
4/20/2017 DataModeling101

Table2.DataNormalizationRules.

Level Rule
Firstnormalform(1NF) Anentitytypeisin1NFwhenitcontainsnorepeatinggroupsofdata.
Secondnormalform(2NF) Anentitytypeisin2NFwhenitisin1NFandwhenallofitsnonkeyattributesarefullydependentonitsprimarykey.
Thirdnormalform(3NF) Anentitytypeisin3NFwhenitisin2NFandwhenallofitsattributesaredirectlydependentontheprimarykey.

Figure7depictsadatabaseschemainONFwhereasFigure8depictsanormalizedschemain3NF.ReadtheIntroductiontoDataNormalizationessayfordetails.

Whydatanormalization?Theadvantageofhavingahighlynormalizeddataschemaisthatinformationisstoredinoneplaceandoneplaceonly,reducingthepossibilityofinconsistent
data.Furthermore,highlynormalizeddataschemasingeneralarecloserconceptuallytoobjectorientedschemasbecausetheobjectorientedgoalsofpromotinghighcohesionandloose
couplingbetweenclassesresultsinsimilarsolutions(atleastfromadatapointofview).Thisgenerallymakesiteasiertomapyourobjectstoyourdataschema.Unfortunately,
normalizationusuallycomesataperformancecost.WiththedataschemaofFigure7allthedataforasingleorderisstoredinonerow(assumingordersofuptonineorderitems),
makingitveryeasytoaccess.WiththedataschemaofFigure7youcouldquicklydeterminethetotalamountofanorderbyreadingthesinglerowfromtheOrder0NFtable.Todoso
withthedataschemaofFigure8youwouldneedtoreaddatafromarowintheOrdertable,datafromalltherowsfromtheOrderItemtableforthatorderanddatafromthecorresponding
rowsintheItemtableforeachorderitem.Forthisquery,thedataschemaofFigure7verylikelyprovidesbetterperformance.

Figure7.AnInitialDataSchemaforOrder(UMLNotation).

Figure8.Anormalizedschemain3NF(UMLNotation).

http://www.agiledata.org/essays/dataModeling101.html 6/9
4/20/2017 DataModeling101

Inclassmodeling,thereisasimilarconceptcalledClassNormalizationalthoughthatisbeyondthescopeofthisarticle.

3.8DenormalizetoImprovePerformance
Normalizeddataschemas,whenputintoproduction,oftensufferfromperformanceproblems.Thismakessensetherulesofdatanormalizationfocusonreducingdataredundancy,not
onimprovingperformanceofdataaccess.Animportantpartofdatamodelingistodenormalizeportionsofyourdataschematoimprovedatabaseaccesstimes.Forexample,thedata
modelofFigure9looksnothinglikethenormalizedschemaofFigure8.Tounderstandwhythedifferencesbetweentheschemasexistyoumustconsidertheperformanceneedsofthe
application.Theprimarygoalofthissystemistoprocessnewordersfromonlinecustomersasquicklyaspossible.Todothiscustomersneedtobeabletosearchforitemsandadd
themtotheirorderquickly,removeitemsfromtheirorderifneedbe,thenhavetheirfinalordertotaledandrecordedquickly.Thesecondarygoalofthesystemistotheprocess,ship,and
billtheordersafterwards.

Figure9.ADenormalizedOrderDataSchema(UMLnotation).

http://www.agiledata.org/essays/dataModeling101.html 7/9
4/20/2017 DataModeling101

Todenormalizethedataschemathefollowingdecisionsweremade:

1.TosupportquicksearchingofiteminformationtheItemtablewasleftalone.
2.TosupporttheadditionandremovaloforderitemstoanordertheconceptofanOrderItemtablewaskept,albeitsplitintwotosupportoutstandingordersandfulfilledorders.New
orderitemscaneasilybeinsertedintotheOutstandingOrderItemtable,orremovedfromit,asneeded.
3.TosupportorderprocessingtheOrderandOrderItemtableswerereworkedintopairstohandleoutstandingandfulfilledordersrespectively.Basicorderinformationisfirststoredin
theOutstandingOrderandOutstandingOrderItemtablesandthenwhentheorderhasbeenshippedandpaidforthedataisthenremovedfromthosetablesandcopiedintothe
FulfilledOrderandFulfilledOrderItemtablesrespectively.Dataaccesstimetothetwotablesforoutstandingordersisreducedbecauseonlytheactiveordersarebeingstored
there.Onaverageanordermaybeoutstandingforacoupleofdays,whereasforfinancialreportingreasonsmaybestoredinthefulfilledordertablesforseveralyearsuntil
archived.Thereisaperformancepenaltyunderthisschemebecauseoftheneedtodeleteoutstandingordersandthenresavethemasfulfilledorders,clearlysomethingthat
wouldneedtobeprocessedasatransaction.
4.Thecontactinformationfortheperson(s)theorderisbeingshippedandbilledtowasalsodenormalizedbackintotheOrdertable,reducingthetimeittakestowriteanordertothe
databasebecausethereisnowonewriteinsteadoftwoorthree.Theretrievalanddeletiontimesforthatdatawouldalsobesimilarlyimproved.

Notethatifyourinitial,normalizeddatadesignmeetstheperformanceneedsofyourapplicationthenitisfineasis.Denormalizationshouldberesortedtoonlywhenperformancetesting
showsthatyouhaveaproblemwithyourobjectsandsubsequentprofilingrevealsthatyouneedtoimprovedatabaseaccesstime.Asmygrandfathersaid,ifitaintbrokedontfixit.

5.Evolutionary/AgileDataModeling
Evolutionarydatamodelingisdatamodelingperformedinaniterativeandincrementalmanner.ThearticleEvolutionaryDevelopmentexploresevolutionarysoftwaredevelopmentin
greaterdetail.Agiledatamodelingisevolutionarydatamodelingdoneinacollaborativemanner.ThearticleAgileDataModeling:FromDomainModelingtoPhysicalModelingworks
throughacasestudywhichshowshowtotakeanagileapproachtodatamodeling.

Althoughyouwouldntthinkit,datamodelingcanbeoneofthemostchallengingtasksthatanAgileDBAcanbeinvolvedwithonanagilesoftwaredevelopmentproject.Yourapproachto
datamodelingwilloftenbeatthecenterofanycontroversybetweentheagilesoftwaredevelopersandthetraditionaldataprofessionalswithinyourorganization.Agilesoftware
developerswillleantowardsanevolutionaryapproachwheredatamodelingisjustoneofmanyactivitieswhereastraditionaldataprofessionalswilloftenleantowardsabigdesignup
front(BDUF)approachwheredatamodelsaretheprimaryartifacts,ifnotTHEartifacts.Thisproblemresultsfromacombinationoftheculturalimpedancemismatch,amisguidedneed
toenforcethe"onetruth",andnormal"politicalmaneuveringwithinyourorganization.AsaresultAgileDBAsoftenfindthatnavigatingthepoliticalwatersisanimportantpartoftheir
datamodelingefforts.

6.HowtoBecomeBetterAtModelingData
Howdoyouimproveyourdatamodelingskills?Practice,practice,practice.WheneveryougetachanceyoushouldworkcloselywithAgileDBAs,volunteertomodeldatawiththem,
andaskthemquestionsastheworkprogresses.AgileDBAswillbefollowingtheAMpracticeModelWithOtherssoshouldwelcometheassistanceaswellasthequestionsoneof
thebestwaystoreallylearnyourcraftistohavesomeoneaswhyareyoudoingitthatway".YoushouldbeabletolearnphysicaldatamodelingskillsfromAgileDBAs,andoftenlogical
datamodelingskillsaswell.

Similarlyyoushouldtaketheopportunitytoworkwiththeenterprisearchitectswithinyourorganization.AsyousawinAgileEnterpriseArchitecturetheyshouldbetakinganactiverole
onyourproject,mentoringyourprojectteamintheenterprisearchitecture(ifany),mentoringyouinmodelingandarchitecturalskills,andaidinginyourteamsmodelinganddevelopment
efforts.Onceagain,volunteertoworkwiththemandaskquestionswhenyouaredoingso.Enterprisearchitectswillbeabletoteachyouconceptualandlogicaldatamodelingskillsas
wellasinstillanappreciationforenterpriseissues.

http://www.agiledata.org/essays/dataModeling101.html 8/9
4/20/2017 DataModeling101
Youalsoneedtodosomereading.Althoughthisarticleisagoodstartitisonlyabriefintroduction.ThebestapproachistosimplyasktheAgileDBAsthatyouworkwithwhatthey
thinkyoushouldread.

Myfinalwordofadviceisthatitiscriticalforapplicationdeveloperstounderstandandappreciatethefundamentalsofdatamodeling.Thisisavaluableskilltohaveandhasbeensince
the1970s.ItalsoprovidesacommonframeworkwithinwhichyoucanworkwithAgileDBAs,andmayevenprovetobetheinitialskillthatenablesyoutomakeacareertransitioninto
becomingafullfledgedAgileDBA.

Sharewithfriends: Tweet LinkedIn Facebook StumbleUpon Digg Baidu Google+

LetUsHelp
Weactivelyworkwithclientsaroundtheworldtoimprovetheirinformationtechnology(IT)practices,typicallyintheroleofmentor/coach,teamlead,ortrainer.Afulldescriptionofwhatwe
do,andhowtocontactus,canbefoundatScottAmbler+Associates.

RecommendedReading

Thisbook,DisciplinedAgileDelivery:APractitioner'sGuidetoAgileSoftwareDeliveryintheEnterprisedescribestheDisciplinedAgileDelivery(DAD)process
decisionframework.TheDADframeworkisapeoplefirst,learningorientedhybridagileapproachtoITsolutiondelivery.Ithasariskvaluedeliverylifecycle,isgoal
driven,isenterpriseaware,andprovidesthefoundationforscalingagile.Thisbookisparticularlyimportantforanyonewhowantstounderstandhowagileworksfrom
endtoendwithinanenterprisesetting.Dataprofessionalswillfinditinterestingbecauseitshowshowagilemodelingandagiledatabasetechniquesfitintotheoverall
solutiondeliveryprocess.Enterpriseprofessionalswillfinditinterestingbeauseitexplicitlypromotestheideathatdisciplinedagileteamsshouldbeenterpriseaware
andthereforeworkcloselywithenterpriseteams.ExistingagiledeveloperswillfinditinterestingbecauseitshowshowtoextendScrumbasedandKanbanbased
strategiestoprovideacoherent,endtoendstreamlineddeliveryprocess.

Ialsomaintainanagiledatabasebookspagewhichoverviewsmanybooksyouwillfindinteresting.

Copyright20022013AmbysoftInc.
ThissiteownedbyAmbysoftInc.

http://www.agiledata.org/essays/dataModeling101.html 9/9

You might also like