0% found this document useful (0 votes)
90 views

Logistic Regression

This document provides instructions for an in-class data science exercise using Tableau. Students will analyze an order data set to forecast future sales, determine which products are frequently purchased together, and interpret the results. The exercise involves downloading an order data file, using Tableau to forecast sales based on historical daily sales data, and performing an association analysis on the data to find products that are commonly purchased together within the same order. The document provides step-by-step directions for completing these analyses in Tableau.

Uploaded by

rphmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Logistic Regression

This document provides instructions for an in-class data science exercise using Tableau. Students will analyze an order data set to forecast future sales, determine which products are frequently purchased together, and interpret the results. The exercise involves downloading an order data file, using Tableau to forecast sales based on historical daily sales data, and performing an association analysis on the data to find products that are commonly purchased together within the same order. The document provides step-by-step directions for completing these analyses in Tableau.

Uploaded by

rphmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MIS0855:DataScience

InClassExercise:SimplePredictiveAnalyticsUsingTableau
Objective:Analyzeadatasettomakeinferencesaboutfutureoutcomes

LearningOutcomes:

Forecastfuturesalesbasedonordertransactiondata
Performassociationanalysistodeterminewhichproductsarepurchasedtogether
Interpretthemeaningoftheresultsfromtheseanalyses
Inthisexercise,youllonceagainbeworkingwithadatasetofordersforanimaginary
company,VandelayIndustries.
Thedatasetcontains102,531lineitemsfor60,011ordersplacedbetweenJanuary1,2009and
December31,2013.

Part1:Downloadthedatafile
1) GototheCommunitySitepostforthisinclassexercise.RightclickVandelayOrdersAll.xlsx
andsaveittoyourcomputer.

2) OpenthedatafileinExcel.TakeaquicklookthroughthedataandtheDataDictionarytab.

Part2:ForecastfuturesalesinTableau
ThefirstthingwelldoisuseTableautopredictfuturesalesbasedondailysalesfrom2009
through2013.Tableauhasaforecastingfeaturebuiltin,soitseasytodo.

1) StartTableauandclickConnecttodata.

2) ClickMicrosoftExcel.

3) OpentheVandelayOrdersAll.xslxworkbook.

Page1
4) DragtheVandelayOrders(All)sheettothewhitespace.ThenclickGotoWorksheet.



5) DragtheOrderShortDatedimensiontotheColumnsshelfandTotalProductPriceto
theRowsshelf.

6) ClickthelinegraphundertheShowMearea.

Page2
7) Youllseealinegraphoftheyeartoyearaggregatesales.

NoticeOrderShortDateappearsasYEAR(OrderShortDate).Tableauautomaticallypresents
datesashierarchiessoyoucandrilldowntoQuarter(orMonthorDay.)



8) ClickontheplussignnexttoYEARtodrilldowntoquarters.Youllseethis:

Page3
9) NowwecanrunaforecastbyselectingtheAnalysismenuandthenForecast/Show
Forecast.Youllseethis:



ThereisagapbecauseTableaudoesntcountthelastdataperiodinitsanalysis.Inthiscase,
ourlastdataperiodisthefourthquarterof2013.

Letstalkaboutsomeotheraspectsofthischart:

Thesolidlinetotherightofthegaparetheforecastedvaluesthepredictionoffuture
sales.
Theshadedareaisthe95%predictioninterval.Thismeansthattheactualvalueswillfall
somewhereintheshadedrange95%ofthetime.Notethatthesolidlineisrightdown
themiddleofthepredictioninterval.
ThepredictionintervalisprettywidethismeansitisdifficultforTableautobe
confidentaboutitspredictionusingquarterlydata.Theresjustnotenoughofittomake
agoodprediction.

Page4
10) ClickontheplussignnexttoQUARTERtodrilldowntoMONTH.Youllseethis:



Noticethatthegapissmaller(becauseitisonlyleavingoutonemonth,notonequarter),
andthatthepredictionintervalismuchnarrower.ThemainreasonforthisisTableauhas
muchmoredatatoworkwith(60monthsinsteadof20quarters).

Themoredatapointsyouuse,thebetteryourpredictionsbecome.

11) Letschangetheconfidencelevelofthepredictioninterval.GototheAnalysismenuand
selectForecast/ForecastOptions

12) Changethepredictionintervalto99%.ThenclickOK.

Page5
13) Youllseethepredictionintervalgetslightlywider,sincenowyoureaskingTableauto
presentarangeofvaluesthatwillcontaintheactualvalue99%ofthetime(insteadof95%).




Toseewhythisistrue,thinkaboutagame
whereyouthrowcrumpleduppaperintoa
wastebasket.Sayyousuccessfullygetthe
paperintothewastebasket95%ofthetime.
Ifyouwanttomakesureyougetitintothe
wastebasket99%ofthetime,oneoptionis
tobuyalargerwastebasket!

Alargerpredictionintervalislikealarger
wastebasket.


14) SaveyourTableauworkbookandcloseit.

Page6
Part3:PerformanassociationanalysisinTableau
(Adaptedfromkb.tableausoftware.com/articles/knowledgebase/marketbasketanalysis)

Associationanalysisisdiscoveringwheneventsoccuratthesametime.Inthiscase,were
lookingforwhichproductsarepurchasedtogether(withinthesameorder).
Tableaudoesnthaveanassociationanalysisfunction,butwithsomeclevertablejoiningwe
candoasimpleversionofthetypeofanalysismoresophisticateddataminingprogramsdo.
1) OpenTableauagain.MakesureyourestartinganewTableaufile.

2) ClickConnecttodata.

3) ClickMicrosoftExcel.

4) ClickONCEonVandelayOrdersAll.xlsx.Justselectthefiledontopenit!

5) ClickthedownarrownexttoOpenandselectOpenwithLegacyConnection.



6) DragtheVandelayOrders(All)sheettothewhitespace.

7) Again,dragtheVandelayOrders(All)sheettothewhitespaceasecondtime.Itshould
looklikethis:


buttheJoindialogmaycoverupthesecondVandelayOrders(All)sheet.

Page7
8) Ifyoudontseethejoindialog,clickonthejoinareabetweenthetwosheets:


9) Youllcreatetwojoins:

SelectProductNamefromDataSourceandValendayOrders(All)$1
Selectthe<>symbolfromthemiddledropdownbox.

SelectOrderIDfromDataSourceandVandelayOrders(All)$1
Selectthe=symbolfromthemiddledropdownbox.

ItshouldlookEXACTLYlikethis:



Sowhatdoesthismean?Itscalledaselfjoinyoureconnectingthetablewithitself.

YoureaskingTableautomatchupanycombinationofdifferentproducts
(Productname<>productname)
thatarepartofthesameorder
(OrderId=orderid).

10) Whenyouhavethissetupliketheimageabove,clickGotoWorksheet.

Page8
11) DragtheProductNamedimensionfromVandelayOrders(All)$(fromthefirstsetof
dimensions)totheColumnsshelf.

ThendragtheproductnamedimensionfromVandelayOrders(All)$1(fromthesecond
setofdimensions)totheRowsshelf.



12) Youllseesomethinglikethis:

Page9
13) UnderMeasures,dragNumberofRecordstotheTexticonundertheMarksarea.



14) Youllnowseethis:



Thisshowshowmanyorderscontainedbothproducts.Forexample,lookatthefirstrow.
WenowknowthatAntiDentiteJeansandAnytown,USASweatshirtsappearedtogetherin
thesameorder3times(hoveryourmouseovertheproductnametoseethewholething).

Hereareafewmore:

BadBreakerUpperSocksandArmoireTShirtsappearedinthesameorder43times.
BabyBoxersandAstronautPenBoxersappearedinthesameorder5times.
BOSCOTShirtsandAntiDentiteJeansappearedinthesameorder2times.

Page10
15) Itsnotdifficulttounderstand,butitwouldbeeasierifwecouldgenerateaneasytoread
visualofthisdata.

DragNumberofRecordstotheColoriconintheMarksarea

16) ClickontheColoriconintheMarksarea,thenclickEditColors

17) ChooseAreaRedforthePalette

18) GobacktoMeasuresanddragSUM(NumberofRecords)intheMarksareatothesize
iconintheMarksarea.

19) ClicktheSizeiconintheMarksareaandmovethesliderabouttwothirdsofthewaytothe
right.



20) Itsnowveryeasytoseetheproductcombinationsthataremostpopular.

Page11
21) Ifyouwanttoseedetailedinformationaboutaproductcombination,hoveryourmouse
overasquare.



22) SaveyourTableauworkbook.

Page12

You might also like