Data Modeling 101
Data Modeling 101
DataModeling101
Search
Thegoalsofthisarticlearetooverviewfundamentaldatamodelingskillsthatalldevelopersshouldhave,skillsthatcanbeappliedonbothtraditionalprojectsthattakeaserialapproach
toagileprojectsthattakeanevolutionaryapproach.MypersonalphilosophyisthateveryITprofessionalshouldhaveabasicunderstandingofdatamodeling.Theydontneedtobe
expertsatdatamodeling,buttheyshouldbepreparedtobeinvolvedinthecreationofsuchamodel,beabletoreadanexistingdatamodel,understandwhenandwhennottocreatea
datamodel,andappreciatefundamentaldatadesigntechniques.Thisarticleisabriefintroductiontotheseskills.Theprimaryaudienceforthisarticleisapplicationdeveloperswhoneed
togainanunderstandingofsomeofthecriticalactivitiesperformedbyanAgileDBA.ThisunderstandingshouldleadtoanappreciationofwhatAgileDBAsdoandwhytheydothem,
anditshouldhelptobridgethecommunicationgapbetweenthesetworoles.
TableofContents
1.Whatisdatamodeling?
Howaredatamodelsusedinpractice?
Whataboutconceptualmodels?
Commondatamodelingnotations
2.Howtomodeldata
Identifyentitytypes
Identifyattributes
Applynamingconventions
Identifyrelationships
Applydatamodelpatterns
Assignkeys
Normalizetoreducedataredundancy
Denormalizetoimproveperformance
3.Evolutionary/agiledatamodeling
4.Howtobecomebetteratmodelingdata
1.WhatisDataModeling?
Datamodelingistheactofexploringdataorientedstructures.Likeothermodelingartifactsdatamodelscanbeusedforavarietyofpurposes,fromhighlevelconceptualmodelsto
physicaldatamodels.Fromthepointofviewofanobjectorienteddeveloperdatamodelingisconceptuallysimilartoclassmodeling.Withdatamodelingyouidentifyentitytypes
whereaswithclassmodelingyouidentifyclasses.Dataattributesareassignedtoentitytypesjustasyouwouldassignattributesandoperationstoclasses.Thereareassociations
betweenentities,similartotheassociationsbetweenclassesrelationships,inheritance,composition,andaggregationareallapplicableconceptsindatamodeling.
Traditionaldatamodelingisdifferentfromclassmodelingbecauseitfocusessolelyondataclassmodelsallowyoutoexploreboththebehavioranddataaspectsofyourdomain,with
adatamodelyoucanonlyexploredataissues.Becauseofthisfocusdatamodelershaveatendencytobemuchbetteratgettingthedataright"thanobjectmodelers.However,some
peoplewillmodeldatabasemethods(storedprocedures,storedfunctions,andtriggers)whentheyarephysicaldatamodeling.Itdependsonthesituationofcourse,butIpersonallythink
thatthisisagoodideaandpromotetheconceptinmyUMLdatamodelingprofile(moreonthislater).
Althoughthefocusofthisarticleisdatamodeling,thereareoftenalternativestodataorientedartifacts(neverforgetAgileModelingsMultipleModelsprinciple).Forexample,whenit
comestoconceptualmodelingORMdiagramsarentyouronlyoptionInadditiontoLDMsitisquitecommonforpeopletocreateUMLclassdiagramsandevenClassResponsibility
Collaborator(CRC)cardsinstead.Infact,myexperienceisthatCRCcardsaresuperiortoORMdiagramsbecauseitisveryeasytogetprojectstakeholdersactivelyinvolvedinthe
creationofthemodel.Insteadofatraditional,analystleddrawingsessionyoucaninsteadfacilitatestakeholdersthroughthecreationofCRCcards.
1.1HowareDataModelsUsedinPractice?
Althoughmethodologyissuesarecoveredlater,weneedtodiscusshowdatamodelscanbeusedinpracticetobetterunderstandthem.Youarelikelytoseethreebasicstylesofdata
model:
Conceptualdatamodels.Thesemodels,sometimescalleddomainmodels,aretypicallyusedtoexploredomainconceptswithprojectstakeholders.On
Agileteamshighlevelconceptualmodelsareoftencreatedaspartofyourinitialrequirementsenvisioningeffortsastheyareusedtoexplorethehighlevel
staticbusinessstructuresandconcepts.OntraditionalteamsconceptualdatamodelsareoftencreatedastheprecursortoLDMsorasalternativestoLDMs.
Logicaldatamodels(LDMs).LDMsareusedtoexplorethedomainconcepts,andtheirrelationships,ofyourproblemdomain.Thiscouldbedoneforthe
scopeofasingleprojectorforyourentireenterprise.LDMsdepictthelogicalentitytypes,typicallyreferredtosimplyasentitytypes,thedataattributes
describingthoseentities,andtherelationshipsbetweentheentities.LDMsarerarelyusedonAgileprojectsalthoughoftenareontraditionalprojects(where
theyrarelyseemtoaddmuchvalueinpractice).
Physicaldatamodels(PDMs).PDMsareusedtodesigntheinternalschemaofadatabase,depictingthedatatables,thedatacolumnsofthosetables,and
therelationshipsbetweenthetables.PDMsoftenprovetobeusefulonbothAgileandtraditionalprojectsandasaresultthefocusofthisarticleison
physicalmodeling.
AlthoughLDMsandPDMssoundverysimilar,andtheyinfactare,thelevelofdetailthattheymodelcanbesignificantlydifferent.Thisisbecausethegoalsforeachdiagramisdifferent
youcanuseanLDMtoexploredomainconceptswithyourstakeholdersandthePDMtodefineyourdatabasedesign.Figure1presentsasimpleLDMandFigure2asimplePDM,both
modelingtheconceptofcustomersandaddressesaswellastherelationshipbetweenthem.BothdiagramsapplytheBarkernotation,summarizedbelow.NoticehowthePDMshows
greaterdetail,includinganassociativetablerequiredtoimplementtheassociationaswellasthekeysneededtomaintaintherelationships.Moreontheseconceptslater.PDMsshould
alsoreflectyourorganizationsdatabasenamingstandards,inthiscaseanabbreviationoftheentitynameisappendedtoeachcolumnnameandanabbreviationforNumber"was
consistentlyintroduced.APDMshouldalsoindicatethedatatypesforthecolumns,suchasintegerandchar(5).AlthoughFigure2doesnotshowthem,lookuptables(alsocalled
referencetablesordescriptiontables)forhowtheaddressisusedaswellasforstatesandcountriesareimpliedbytheattributesADDR_USAGE_CODE,STATE_CODE,and
COUNTRY_CODE.
Figure1.Asimplelogicaldatamodel.
http://www.agiledata.org/essays/dataModeling101.html 1/9
4/20/2017 DataModeling101
Figure2.Asimplephysicaldatamodel.
AnimportantobservationaboutFigures1and2isthatImnotslavishlyfollowingBarkersapproachtonamingrelationships.Forexample,betweenCustomerandAddresstherereally
shouldbetwonamesEachCUSTOMERmaybelocatedinoneormoreADDRESSES"andEachADDRESSmaybethesiteofoneormoreCUSTOMERS".Althoughthesenames
explicitlydefinetherelationshipIpersonallythinkthattheyrevisualnoisethatclutterthediagram.Iprefersimplenamessuchashas"andthentrustmyreaderstointerpretthenamein
eachdirection.Illonlyaddmoreinformationwhereitsneeded,inthiscaseIthinkthatitisnt.However,asignificantadvantageofdescribingthenamesthewaythatBarkersuggestsis
thatitsagoodtesttoseeifyouactuallyunderstandtherelationshipifyoucantnameitthenyoulikelydontunderstandit.
Datamodelscanbeusedeffectivelyatboththeenterpriselevelandonprojects.EnterprisearchitectswilloftencreateoneormorehighlevelLDMsthatdepictthedatastructuresthat
supportyourenterprise,modelstypicallyreferredtoasenterprisedatamodelsorenterpriseinformationmodels.Anenterprisedatamodelisoneofseveralviewsthatyourorganizations
enterprisearchitectsmaychoosetomaintainandsupportotherviewsmayexploreyournetwork/hardwareinfrastructure,yourorganizationstructure,yoursoftwareinfrastructure,and
yourbusinessprocesses(tonameafew).Enterprisedatamodelsprovideinformationthataprojectteamcanusebothasasetofconstraintsaswellasimportantinsightsintothe
structureoftheirsystem.
ProjectteamswilltypicallycreateLDMsasaprimaryanalysisartifactwhentheirimplementationenvironmentispredominantlyproceduralinnature,forexampletheyareusingstructured
COBOLasanimplementationlanguage.LDMsarealsoagoodchoicewhenaprojectisdataorientedinnature,perhapsadatawarehouseorreportingsystemisbeingdeveloped(having
saidthat,experienceseemstoshowthatusagecenteredapproachesappeartoworkevenbetter).HoweverLDMsareoftenapoorchoicewhenaprojectteamisusingobjectorientedor
componentbasedtechnologiesbecausethedeveloperswouldratherworkwithUMLdiagramsorwhentheprojectisnotdataorientedinnature.AsAgileModelingadvises,applytheright
artifact(s)forthejob.Or,asyourgrandfatherlikelyadvisedyou,usetherighttoolforthejob.It'simportanttonotethattraditionalapproachestoMasterDataManagement(MDM)will
oftenmotivatethecreationandmaintenanceofdetailedLDMs,aneffortthatisrarelyjustifiableinpracticewhenyouconsiderthetotalcostofownership(TCO)whencalculatingthe
returnoninvestment(ROI)ofthosesortsofefforts.
WhenarelationaldatabaseisusedfordatastorageprojectteamsarebestadvisedtocreateaPDMstomodelitsinternalschema.MyexperienceisthataPDMisoftenoneofthe
criticaldesignartifactsforbusinessapplicationdevelopmentprojects.
2.2.WhatAboutConceptualModels?
Halpin(2001)pointsoutthatmanydataprofessionalsprefertocreateanObjectRoleModel(ORM),anexampleisdepictedinFigure3,insteadofanLDMforaconceptualmodel.The
advantageisthatthenotationisverysimple,somethingyourprojectstakeholderscanquicklygrasp,althoughthedisadvantageisthatthemodelsbecomelargeveryquickly.ORMs
enableyoutofirstexploreactualdataexamplesinsteadofsimplyjumpingtoapotentiallyincorrectabstractionforexampleFigure3examinestherelationshipbetweencustomersand
addressesindetail.FormoreinformationaboutORM,visitwww.orm.net.
Figure3.AsimpleObjectRoleModel.
Myexperienceisthatpeoplewillcaptureinformationinthebestplacethattheyknow.AsaresultItypicallydiscardORMsafterImfinishedwiththem.Isometimes
userORMstoexplorethedomainwithprojectstakeholdersbutlaterreplacethemwithamoretraditionalartifactsuchasanLDM,aclassdiagram,orevena
PDM.Asageneralizingspecialist,someonewithoneormorespecialtieswhoalsostrivestogaingeneralskillsandknowledge,thisisaneasydecisionformeto
makeIknowthatthisinformationthatIvejustdiscarded"willbecapturedinanotherartifactamodel,thetests,oreventhecodethatIunderstand.A
specialistwhoonlyunderstandsalimitednumberofartifactsandthereforehandsoff"theirworktootherspecialistsdoesnthavethisasanoption.Notonlyarethey
temptedtokeeptheartifactsthattheycreatebutalsotoinvestevenmoretimetoenhancetheartifacts.Generalizingspecialistsaremorelikelythanspecialiststo
travellight.
2.3.CommonDataModelingNotations
Figure4presentsasummaryofthesyntaxoffourcommondatamodelingnotations:InformationEngineering(IE),Barker,IDEF1X,andtheUnifiedModelingLanguage(UML).This
diagramisntmeanttobecomprehensive,insteaditsgoalistoprovideabasicoverview.Furthermore,forthesakeofbrevityIwasntabletodepictthehighlydetailedapproachto
relationshipnamingthatBarkersuggests.AlthoughIprovideabriefdescriptionofeachnotationinTable1IhighlysuggestDavidHayspaperAComparisonofDataModelingTechniques
ashegoesintogreaterdetailthanIdo.
http://www.agiledata.org/essays/dataModeling101.html 2/9
4/20/2017 DataModeling101
Figure4.Comparingthesyntaxofcommondatamodelingnotations.
Table1.Discussingcommondatamodelingnotations.
Notation Comments
TheIEnotation(Finkelstein1989)issimpleandeasytoread,andiswellsuitedforhighlevellogicalandenterprisedatamodeling.Theonlydrawbackofthisnotation,arguably
IE anadvantage,isthatitdoesnotsupporttheidentificationofattributesofanentity.Theassumptionisthattheattributeswillbemodeledwithanotherdiagramorsimply
describedinthesupportingdocumentation.
TheBarkernotationisoneofthemorepopularones,itissupportedbyOraclestoolset,andiswellsuitedforalltypesofdatamodels.Itsapproachtosubtypingcanbecome
Barker
clunkywithhierarchiesthatgoseverallevelsdeep.
Thisnotationisoverlycomplex.Itwasoriginallyintendedforphysicalmodelingbuthasbeenmisappliedforlogicalmodelingaswell.AlthoughpopularwithinsomeU.S.
IDEF1X
governmentagencies,particularlytheDepartmentofDefense(DoD),thisnotationhasbeenallbutabandonedbyeveryoneelse.Avoiditifyoucan.
Thisisnotanofficialdatamodelingnotation(yet).AlthoughseveralsuggestionsforadatamodelingprofilefortheUMLexist,nonearecompleteandmoreimportantlyarenot
UML
official"UMLyet.However,theObjectManagementGroup(OMG)inDecember2005announcedanRFPfordataorientedmodels.
http://www.agiledata.org/essays/dataModeling101.html 3/9
4/20/2017 DataModeling101
3.HowtoModelData
ItiscriticalforanapplicationdevelopertohaveagraspofthefundamentalsofdatamodelingsotheycannotonlyreaddatamodelsbutalsoworkeffectivelywithAgileDBAswhoare
responsibleforthedataorientedaspectsofyourproject.Yourgoalreadingthissectionisnottolearnhowtobecomeadatamodeler,insteaditissimplytogainanappreciationofwhatis
involved.
Thefollowingtasksareperformedinaniterativemanner:
Identifyentitytypes
Identifyattributes
Applynamingconventions
Identifyrelationships
Applydatamodelpatterns
Assignkeys
Normalizetoreducedataredundancy
Denormalizetoimproveperformance
VerygoodpracticalbooksaboutdatamodelingincludeJoeCelkosData&DatabasesandDataModeling
forInformationProfessionalsastheybothfocusonpracticalissueswithdatamodeling.TheData
ModelingHandbookandDataModelPatternsarebothexcellentresourcesonceyouvemasteredthe
fundamentals.AnIntroductiontoDatabaseSystemsisagoodacademictreatiseforanyonewishingto
becomeadataspecialist.
3.1IdentifyEntityTypes
Anentitytype,alsosimplycalledentity(notexactlyaccurateterminology,butverycommoninpractice),issimilarconceptuallytoobjectorientationsconceptofaclassanentitytype
representsacollectionofsimilarobjects.Anentitytypecouldrepresentacollectionofpeople,places,things,events,orconcepts.Examplesofentitiesinanorderentrysystemwould
includeCustomer,Address,Order,Item,andTax.Ifyouwereclassmodelingyouwouldexpecttodiscoverclasseswiththeexactsamenames.However,thedifferencebetweenaclass
andanentitytypeisthatclasseshavebothdataandbehaviorwhereasentitytypesjusthavedata.
Ideallyanentityshouldbenormal,thedatamodelingworldsversionofcohesive.Anormalentitydepictsoneconcept,justlikeacohesiveclassmodelsoneconcept.Forexample,
customerandorderareclearlytwodifferentconceptsthereforeitmakessensetomodelthemasseparateentities.
3.2IdentifyAttributes
Eachentitytypewillhaveoneormoredataattributes.Forexample,inFigure1yousawthattheCustomerentityhasattributessuchasFirstNameandSurnameandinFigure2thatthe
TCUSTOMERtablehadcorrespondingdatacolumnsCUST_FIRST_NAMEandCUST_SURNAME(acolumnistheimplementationofadataattributewithinarelationaldatabase).
Attributesshouldalsobecohesivefromthepointofviewofyourdomain,somethingthatisoftenajudgmentcall.inFigure1wedecidedthatwewantedtomodelthefactthatpeople
hadbothfirstandlastnamesinsteadofjustaname(e.g.Scott"andAmbler"vs.ScottAmbler")whereaswedidnotdistinguishbetweenthesectionsofanAmericanzipcode(e.g.
9021012345678).Gettingthelevelofdetailrightcanhaveasignificantimpactonyourdevelopmentandmaintenanceefforts.Refactoringasingledatacolumnintoseveralcolumnscan
bedifficult,databaserefactoringisdescribedindetailinDatabaseRefactoring,althoughoverspecifyinganattribute(e.g.havingthreeattributesforzipcodewhenyouonlyneededone)
canresultinoverbuildingyoursystemandhenceyouincurgreaterdevelopmentandmaintenancecoststhanyouactuallyneeded.
3.3ApplyDataNamingConventions
Yourorganizationshouldhavestandardsandguidelinesapplicabletodatamodeling,somethingyoushouldbeabletoobtainfromyourenterpriseadministrators(iftheydontexistyou
shouldlobbytohavesomeputinplace).Theseguidelinesshouldincludenamingconventionsforbothlogicalandphysicalmodeling,thelogicalnamingconventionsshouldbefocused
onhumanreadabilitywhereasthephysicalnamingconventionswillreflecttechnicalconsiderations.YoucanclearlyseethatdifferentnamingconventionswereappliedinFigures1and
2.
AsyousawinIntroductiontoAgileModeling,AMincludestheApplyModelingStandardspractice.Thebasicideaisthatdevelopersshouldagreetoandfollowacommonsetofmodeling
standardsonasoftwareproject.Justlikethereisvalueinfollowingcommoncodingconventions,cleancodethatfollowsyourchosencodingguidelinesiseasiertounderstandand
evolvethancodethatdoesn't,thereissimilarvalueinfollowingcommonmodelingconventions.
3.4IdentifyRelationships
Intherealworldentitieshaverelationshipswithotherentities.Forexample,customersPLACEorders,customersLIVEATaddresses,andlineitemsAREPARTOForders.Place,live
at,andarepartofarealltermsthatdefinerelationshipsbetweenentities.Therelationshipsbetweenentitiesareconceptuallyidenticaltotherelationships(associations)betweenobjects.
Figure5depictsapartialLDMforanonlineorderingsystem.Thefirstthingtonoticeisthevariousstylesappliedtorelationshipnamesandrolesdifferentrelationshipsrequiredifferent
approaches.ForexampletherelationshipbetweenCustomerandOrderhastwonames,placesandisplacedby,whereastherelationshipbetweenCustomerandAddresshasone.In
thisexamplehavingasecondnameontherelationship,theideabeingthatyouwanttospecifyhowtoreadtherelationshipineachdirection,isredundantyourebetterofftofinda
clearwordingforasinglerelationshipname,decreasingtheclutteronyourdiagram.Similarlyyouwilloftenfindthatbyspecifyingtherolesthatanentityplaysinarelationshipwilloften
negatetheneedtogivetherelationshipaname(althoughsomeCASEtoolsmayinadvertentlyforceyoutodothis).Forexampletheroleofbillingaddressandthelabelbilledtoare
clearlyredundant,youreallyonlyneedone.ForexampletherolepartofthatLineItemhasinitsrelationshipwithOrderissufficientlyobviouswithoutarelationshipname.
Figure5.Alogicaldatamodel(InformationEngineeringnotation).
http://www.agiledata.org/essays/dataModeling101.html 4/9
4/20/2017 DataModeling101
Youalsoneedtoidentifythecardinalityandoptionalityofarelationship(theUMLcombinestheconceptsofoptionalityandcardinalityintothesingleconceptofmultiplicity).Cardinality
representstheconceptofhowmany"whereasoptionalityrepresentstheconceptofwhetheryoumusthavesomething."Forexample,itisnotenoughtoknowthatcustomersplace
orders.Howmanyorderscanacustomerplace?None,one,orseveral?Furthermore,relationshipsaretwowaystreets:notonlydocustomersplaceorders,butordersareplacedby
customers.Thisleadstoquestionslike:howmanycustomerscanbeenrolledinanygivenorderandisitpossibletohaveanorderwithnocustomerinvolved?Figure5showsthat
customersplacezeroormoreordersandthatanygivenorderisplacedbyonecustomerandonecustomeronly.Italsoshowsthatacustomerlivesatoneormoreaddressesandthat
anygivenaddresshaszeroormorecustomerslivingatit.
AlthoughtheUMLdistinguishesbetweendifferenttypesofrelationshipsassociations,inheritance,aggregation,composition,anddependencydatamodelersoftenarentasconcerned
withthisissueasmuchasobjectmodelersare.Subtyping,oneapplicationofinheritance,isoftenfoundindatamodels,anexampleofwhichistheisarelationshipbetweenItemandits
twosubentities"ServiceandProduct.Aggregationandcompositionaremuchlesscommonandtypicallymustbeimpliedfromthedatamodel,asyouseewiththepartofrolethatLine
ItemtakeswithOrder.UMLdependenciesaretypicallyasoftwareconstructandthereforewouldntappearonadatamodel,unlessofcourseitwasaveryhighlydetailedphysicalmodel
thatshowedhowviews,triggers,orstoredproceduresdependedonotheraspectsofthedatabaseschema.
3.5ApplyDataModelPatterns
Somedatamodelerswillapplycommondatamodelpatterns,DavidHaysbookDataModelPatternsisthebestreferenceonthesubject,justasobjectoriented
developerswillapplyanalysispatterns(Fowler1997Ambler1997)anddesignpatterns(Gammaetal.1995).Datamodelpatternsareconceptuallyclosestto
analysispatternsbecausetheydescribesolutionstocommondomainissues.Haysbookisaverygoodreferenceforanyoneinvolvedinanalysislevelmodeling,
evenwhenyouretakinganobjectapproachinsteadofadataapproachbecausehispatternsmodelbusinessstructuresfromawidevarietyofbusinessdomains.
3.6AssignKeys
Therearetwofundamentalstrategiesforassigningkeystotables.First,youcouldassignanaturalkeywhichisoneormoreexistingdataattributesthatareuniquetothebusiness
concept.TheCustomertableofFigure6therewastwocandidatekeys,inthiscaseCustomerNumberandSocialSecurityNumber.Second,youcouldintroduceanewcolumn,calleda
surrogatekey,whichisakeythathasnobusinessmeaning.AnexampleofwhichistheAddressIDcolumnoftheAddresstableinFigure6.Addressesdonthaveaneasy"naturalkey
becauseyouwouldneedtouseallofthecolumnsoftheAddresstabletoformakeyforitself(youmightbeabletogetawaywithjustthecombinationofStreetandZipCodedepending
onyourproblemdomain),thereforeintroducingasurrogatekeyisamuchbetteroptioninthiscase.
Figure6.CustomerandAddressrevisited(UMLnotation).
Let'sconsiderFigure6inmoredetail.Figure6presentsanalternativedesigntothatpresentedinFigure2,adifferentnamingconventionwasadoptedandthemodelitselfismore
extensive.InFigure6theCustomertablehastheCustomerNumbercolumnasitsprimarykeyandSocialSecurityNumberasanalternatekey.Thisindicatesthatthepreferredwayto
accesscustomerinformationisthroughthevalueofapersonscustomernumberalthoughyoursoftwarecangetatthesameinformationifithasthepersonssocialsecuritynumber.
TheCustomerHasAddresstablehasacompositeprimarykey,thecombinationofCustomerNumberandAddressID.Aforeignkeyisoneormoreattributesinanentitytypethat
representsakey,eitherprimaryorsecondary,inanotherentitytype.Foreignkeysareusedtomaintainrelationshipsbetweenrows.Forexample,therelationshipsbetweenrowsinthe
CustomerHasAddresstableandtheCustomertableismaintainedbytheCustomerNumbercolumnwithintheCustomerHasAddresstable.Theinterestingthingaboutthe
CustomerNumbercolumnisthefactthatitispartoftheprimarykeyforCustomerHasAddressaswellastheforeignkeytotheCustomertable.Similarly,theAddressIDcolumnispartof
theprimarykeyofCustomerHasAddressaswellasaforeignkeytotheAddresstabletomaintaintherelationshipwithrowsofAddress.
Althoughthe"naturalvs.surrogate"debateisoneofthegreatreligiousissueswithinthedatacommunity,thefactisthatneitherstrategyisperfectandyou'lldiscoverthatinpractice(as
weseeinFigure6)sometimesitmakessensetousenaturalkeysandsometimesitmakessensetousesurrogatekeys.InChoosingaPrimaryKey:NaturalorSurrogate?Idescribe
therelevantissuesindetail.
3.7NormalizetoReduceDataRedundancy
Datanormalizationisaprocessinwhichdataattributeswithinadatamodelareorganizedtoincreasethecohesionofentitytypes.Inotherwords,thegoalofdata
normalizationistoreduceandeveneliminatedataredundancy,animportantconsiderationforapplicationdevelopersbecauseitisincrediblydifficulttostores
objectsinarelationaldatabasethatmaintainsthesameinformationinseveralplaces.Table2summarizesthethreemostcommonnormalizationrulesdescribing
howtoputentitytypesintoaseriesofincreasinglevelsofnormalization.Higherlevelsofdatanormalization(Date2000)arebeyondthescopeofthisbook.With
respecttoterminology,adataschemaisconsideredtobeatthelevelofnormalizationofitsleastnormalizedentitytype.Forexample,ifallofyourentitytypesare
atsecondnormalform(2NF)orhigherthenwesaythatyourdataschemaisat2NF.
http://www.agiledata.org/essays/dataModeling101.html 5/9
4/20/2017 DataModeling101
Table2.DataNormalizationRules.
Level Rule
Firstnormalform(1NF) Anentitytypeisin1NFwhenitcontainsnorepeatinggroupsofdata.
Secondnormalform(2NF) Anentitytypeisin2NFwhenitisin1NFandwhenallofitsnonkeyattributesarefullydependentonitsprimarykey.
Thirdnormalform(3NF) Anentitytypeisin3NFwhenitisin2NFandwhenallofitsattributesaredirectlydependentontheprimarykey.
Figure7depictsadatabaseschemainONFwhereasFigure8depictsanormalizedschemain3NF.ReadtheIntroductiontoDataNormalizationessayfordetails.
Whydatanormalization?Theadvantageofhavingahighlynormalizeddataschemaisthatinformationisstoredinoneplaceandoneplaceonly,reducingthepossibilityofinconsistent
data.Furthermore,highlynormalizeddataschemasingeneralarecloserconceptuallytoobjectorientedschemasbecausetheobjectorientedgoalsofpromotinghighcohesionandloose
couplingbetweenclassesresultsinsimilarsolutions(atleastfromadatapointofview).Thisgenerallymakesiteasiertomapyourobjectstoyourdataschema.Unfortunately,
normalizationusuallycomesataperformancecost.WiththedataschemaofFigure7allthedataforasingleorderisstoredinonerow(assumingordersofuptonineorderitems),
makingitveryeasytoaccess.WiththedataschemaofFigure7youcouldquicklydeterminethetotalamountofanorderbyreadingthesinglerowfromtheOrder0NFtable.Todoso
withthedataschemaofFigure8youwouldneedtoreaddatafromarowintheOrdertable,datafromalltherowsfromtheOrderItemtableforthatorderanddatafromthecorresponding
rowsintheItemtableforeachorderitem.Forthisquery,thedataschemaofFigure7verylikelyprovidesbetterperformance.
Figure7.AnInitialDataSchemaforOrder(UMLNotation).
Figure8.Anormalizedschemain3NF(UMLNotation).
http://www.agiledata.org/essays/dataModeling101.html 6/9
4/20/2017 DataModeling101
Inclassmodeling,thereisasimilarconceptcalledClassNormalizationalthoughthatisbeyondthescopeofthisarticle.
3.8DenormalizetoImprovePerformance
Normalizeddataschemas,whenputintoproduction,oftensufferfromperformanceproblems.Thismakessensetherulesofdatanormalizationfocusonreducingdataredundancy,not
onimprovingperformanceofdataaccess.Animportantpartofdatamodelingistodenormalizeportionsofyourdataschematoimprovedatabaseaccesstimes.Forexample,thedata
modelofFigure9looksnothinglikethenormalizedschemaofFigure8.Tounderstandwhythedifferencesbetweentheschemasexistyoumustconsidertheperformanceneedsofthe
application.Theprimarygoalofthissystemistoprocessnewordersfromonlinecustomersasquicklyaspossible.Todothiscustomersneedtobeabletosearchforitemsandadd
themtotheirorderquickly,removeitemsfromtheirorderifneedbe,thenhavetheirfinalordertotaledandrecordedquickly.Thesecondarygoalofthesystemistotheprocess,ship,and
billtheordersafterwards.
Figure9.ADenormalizedOrderDataSchema(UMLnotation).
http://www.agiledata.org/essays/dataModeling101.html 7/9
4/20/2017 DataModeling101
Todenormalizethedataschemathefollowingdecisionsweremade:
1.TosupportquicksearchingofiteminformationtheItemtablewasleftalone.
2.TosupporttheadditionandremovaloforderitemstoanordertheconceptofanOrderItemtablewaskept,albeitsplitintwotosupportoutstandingordersandfulfilledorders.New
orderitemscaneasilybeinsertedintotheOutstandingOrderItemtable,orremovedfromit,asneeded.
3.TosupportorderprocessingtheOrderandOrderItemtableswerereworkedintopairstohandleoutstandingandfulfilledordersrespectively.Basicorderinformationisfirststoredin
theOutstandingOrderandOutstandingOrderItemtablesandthenwhentheorderhasbeenshippedandpaidforthedataisthenremovedfromthosetablesandcopiedintothe
FulfilledOrderandFulfilledOrderItemtablesrespectively.Dataaccesstimetothetwotablesforoutstandingordersisreducedbecauseonlytheactiveordersarebeingstored
there.Onaverageanordermaybeoutstandingforacoupleofdays,whereasforfinancialreportingreasonsmaybestoredinthefulfilledordertablesforseveralyearsuntil
archived.Thereisaperformancepenaltyunderthisschemebecauseoftheneedtodeleteoutstandingordersandthenresavethemasfulfilledorders,clearlysomethingthat
wouldneedtobeprocessedasatransaction.
4.Thecontactinformationfortheperson(s)theorderisbeingshippedandbilledtowasalsodenormalizedbackintotheOrdertable,reducingthetimeittakestowriteanordertothe
databasebecausethereisnowonewriteinsteadoftwoorthree.Theretrievalanddeletiontimesforthatdatawouldalsobesimilarlyimproved.
Notethatifyourinitial,normalizeddatadesignmeetstheperformanceneedsofyourapplicationthenitisfineasis.Denormalizationshouldberesortedtoonlywhenperformancetesting
showsthatyouhaveaproblemwithyourobjectsandsubsequentprofilingrevealsthatyouneedtoimprovedatabaseaccesstime.Asmygrandfathersaid,ifitaintbrokedontfixit.
5.Evolutionary/AgileDataModeling
Evolutionarydatamodelingisdatamodelingperformedinaniterativeandincrementalmanner.ThearticleEvolutionaryDevelopmentexploresevolutionarysoftwaredevelopmentin
greaterdetail.Agiledatamodelingisevolutionarydatamodelingdoneinacollaborativemanner.ThearticleAgileDataModeling:FromDomainModelingtoPhysicalModelingworks
throughacasestudywhichshowshowtotakeanagileapproachtodatamodeling.
Althoughyouwouldntthinkit,datamodelingcanbeoneofthemostchallengingtasksthatanAgileDBAcanbeinvolvedwithonanagilesoftwaredevelopmentproject.Yourapproachto
datamodelingwilloftenbeatthecenterofanycontroversybetweentheagilesoftwaredevelopersandthetraditionaldataprofessionalswithinyourorganization.Agilesoftware
developerswillleantowardsanevolutionaryapproachwheredatamodelingisjustoneofmanyactivitieswhereastraditionaldataprofessionalswilloftenleantowardsabigdesignup
front(BDUF)approachwheredatamodelsaretheprimaryartifacts,ifnotTHEartifacts.Thisproblemresultsfromacombinationoftheculturalimpedancemismatch,amisguidedneed
toenforcethe"onetruth",andnormal"politicalmaneuveringwithinyourorganization.AsaresultAgileDBAsoftenfindthatnavigatingthepoliticalwatersisanimportantpartoftheir
datamodelingefforts.
6.HowtoBecomeBetterAtModelingData
Howdoyouimproveyourdatamodelingskills?Practice,practice,practice.WheneveryougetachanceyoushouldworkcloselywithAgileDBAs,volunteertomodeldatawiththem,
andaskthemquestionsastheworkprogresses.AgileDBAswillbefollowingtheAMpracticeModelWithOtherssoshouldwelcometheassistanceaswellasthequestionsoneof
thebestwaystoreallylearnyourcraftistohavesomeoneaswhyareyoudoingitthatway".YoushouldbeabletolearnphysicaldatamodelingskillsfromAgileDBAs,andoftenlogical
datamodelingskillsaswell.
Similarlyyoushouldtaketheopportunitytoworkwiththeenterprisearchitectswithinyourorganization.AsyousawinAgileEnterpriseArchitecturetheyshouldbetakinganactiverole
onyourproject,mentoringyourprojectteamintheenterprisearchitecture(ifany),mentoringyouinmodelingandarchitecturalskills,andaidinginyourteamsmodelinganddevelopment
efforts.Onceagain,volunteertoworkwiththemandaskquestionswhenyouaredoingso.Enterprisearchitectswillbeabletoteachyouconceptualandlogicaldatamodelingskillsas
wellasinstillanappreciationforenterpriseissues.
http://www.agiledata.org/essays/dataModeling101.html 8/9
4/20/2017 DataModeling101
Youalsoneedtodosomereading.Althoughthisarticleisagoodstartitisonlyabriefintroduction.ThebestapproachistosimplyasktheAgileDBAsthatyouworkwithwhatthey
thinkyoushouldread.
Myfinalwordofadviceisthatitiscriticalforapplicationdeveloperstounderstandandappreciatethefundamentalsofdatamodeling.Thisisavaluableskilltohaveandhasbeensince
the1970s.ItalsoprovidesacommonframeworkwithinwhichyoucanworkwithAgileDBAs,andmayevenprovetobetheinitialskillthatenablesyoutomakeacareertransitioninto
becomingafullfledgedAgileDBA.
LetUsHelp
Weactivelyworkwithclientsaroundtheworldtoimprovetheirinformationtechnology(IT)practices,typicallyintheroleofmentor/coach,teamlead,ortrainer.Afulldescriptionofwhatwe
do,andhowtocontactus,canbefoundatScottAmbler+Associates.
RecommendedReading
Thisbook,DisciplinedAgileDelivery:APractitioner'sGuidetoAgileSoftwareDeliveryintheEnterprisedescribestheDisciplinedAgileDelivery(DAD)process
decisionframework.TheDADframeworkisapeoplefirst,learningorientedhybridagileapproachtoITsolutiondelivery.Ithasariskvaluedeliverylifecycle,isgoal
driven,isenterpriseaware,andprovidesthefoundationforscalingagile.Thisbookisparticularlyimportantforanyonewhowantstounderstandhowagileworksfrom
endtoendwithinanenterprisesetting.Dataprofessionalswillfinditinterestingbecauseitshowshowagilemodelingandagiledatabasetechniquesfitintotheoverall
solutiondeliveryprocess.Enterpriseprofessionalswillfinditinterestingbeauseitexplicitlypromotestheideathatdisciplinedagileteamsshouldbeenterpriseaware
andthereforeworkcloselywithenterpriseteams.ExistingagiledeveloperswillfinditinterestingbecauseitshowshowtoextendScrumbasedandKanbanbased
strategiestoprovideacoherent,endtoendstreamlineddeliveryprocess.
Ialsomaintainanagiledatabasebookspagewhichoverviewsmanybooksyouwillfindinteresting.
Copyright20022013AmbysoftInc.
ThissiteownedbyAmbysoftInc.
http://www.agiledata.org/essays/dataModeling101.html 9/9