Sci-Fin: Visual Mining Spatial and Temporal Behavior Features from Social Media
Abstract
:1. Introduction
- Three novel visual fingerprint designs that capture the characteristics of check-in data from the aspects of space-time features, activity-event features and user-topic features. The designs are intuitive, compact, and informative.
- A comprehensive visual analytics system based on the fingerprint designs to facilitate the analysis of social check-in data and uncover people’s spatio-temporal patterns from the data.
- Field implementation in real social service platform, case studies with real check-in data and interesting findings on users’ check-in behaviors.
2. Related Work
3. Visualization Design
3.1. Social Check-in Data
3.2. Task Abstraction
- (V1) Space-time related features: Spatial temporal attributes are important in user behavior analysis. They allow analyzers to capture exactly when and where users check in and what kind of geographic objects surrounding them. Check-in data contain POI and geographic location information with time records. To visualize or analyze such complex features, we set up a region-based visual design which considers the region as a visualization unit. Regions can be constructed politically (partitioned by governments) or by users (mental maps) and cover a certain area in the geographic space and contain a series of POI. Our visual design aims to simultaneously show spatial attributes with temporal information plus other features such as check-in user activities or numbers in such areas.
- (V2) Activity-event related features: As well as the check-in users’ location and time, analyzers also need to know what the users are doing in a certain location or at a specific time point. Therefore analyzers can trace the check-in users’ social trends, evaluate the activity/event’s influence (here we can define it as evolution in spatial temporal dimensions), and monitor the events or activities online to detect anomalies or discover any correlation among activities. Our visual design aims to provide analyzers a combined display to integrate spatial changes with temporal information on a specified activity. The number of users can also be visualized and explored in the view.
- (V3) User-topic related features: Many powerful new applications can be developed based on the analysis of check-in users behaviors such as real world personalized recommendations. Based on the personalized recommendations, business people or companies can locate potential customers more efficiently. Therefore, it is important for us to design a visual display in the system to show the spatio-temporal evolution of certain groups of users, and together with their contexts to discover their behaviors.
- (T1) Queries related to a region: Users may ask what kind of check-in activities occur in which region? What is the total number of check-ins for different activities in that area? Does one dominant activity exist (i.e., an activity contributes the majority of check-in records in a region)?
- (T2) The queries related to an activity: What is the spatial distribution of the number of check-in users in different regions? Do any routine patterns in the time-related features exist? Hot spot detection and analysis.
- (T3) Queries related to a certain user group: Location-based-service; frequently visited places/time/why(activity); What is the favorite time for users’ activity (check-in)? What is the most likely check-in time for users when they play sport? What is the grouping pattern of a group?
- (T4) Comparisons of two or multiple regions, two or multiple activities, two or multiple users/user groups: Can we identify the difference among different regions (actually we want to figure out whether different politically constructed regions can be identified by social activities or not)? Does one kind of activity with some patterns in spatial distribution exist (Concentrated distribution or scattered distribution)? Does the spatial or temporal correlation of different activities exist?
- (T5) Complex analytical tasks like correlation or co-occurrence patterns involving different regions, activities and user groups: Does similar geographic info lead to similar visual patterns (activity, user number, temporal distribution)? Does any activity have a global influence or just local? Does a similar living neighborhood of different user groups lead to similar behavior (sequence)?
3.3. Design Rationals
- (D1) Social Check-in data’s visual representation should capture the characteristics of the social, spatial, and temporal attributes of the data. Check-in data contains three major features; location, check-in activity (e.g., dinner or exercise), and user. These three features evolve both in spatial and temporal ways. An effective visual representation must convey the characteristics of check-in behaviors from the three features respectively in a spatio-temporal manner. Fingerprints adhere to this rational by transforming the check-in data into a three-level icon-based visual design integrated with some well-established visualization techniques.
- (D2) Fingerprints should encode the hierarchical structure among the three major features including location, check-in activity, and users’ information. Here fingerprints use the well-established visualization Voronoi Treemap to convey this information in a rather compact display space on the screen. Therefore a D1 based system can simultaneously display the spatial temporal information with such hierarchical information and facilitate the comparison since we can put multiple fingerprints together for analysis.
- (D3) Any representation of social check-in data’s should be intuitive and informative to facilitate the comparison tasks of check-in behaviors at different locations, of different activities, and of different people. Therefore icon-based glyph design has been proposed since icons can compactly convey a lot of information which only occupies a small area of the display screen. Icons can also be dynamically placed on interesting or important locations for intuitive spatial comparison. In addition, icon-based design can be very flexible in scale and have a better rendering performance than a pixel-based display. We chose a radial layout since the circular shape can convey most information by encoding most of the Voronoi cells in a compact area based on D2. Here we follow the assumption that most temporal comparison tasks will focus on periodic patterns so we chose to use a circular bar chart laying out the icons to encode temporal changes. For possible linear temporal exploration or analysis tasks, we added the ThemeRiver [33] function into our system with its enriched interactions to visually depict the changes in check-in activity strength over time using a river metaphor and provide an overall tendencies for users. In order to discover the correlation between different spatial locations, we use World Mapper distortion techniques to distort spatial changes in the icon center. As maps commonly used in daily life may be very familiar with the regions’ geometric shapes, we can utilize this fact to convey any spatial changes in a small area (icon center) for analyzers to discover any correlation such as co-location activities.
- (D4) Fingerprints for similar spatio-temporal patterns should appear visually similar while dissimilar patterns should have unique visual features that are easily distinguishable. Fingerprints should provide at-a-glance representations that allow users to easily determine which region/activity/user (s) are unique and which are in similar shape. This design requirement can be critical for both pattern identification and comparison tasks. Fingerprints satisfy this design guideline by using some well established visualization methods such as WorldMapper, Voronoi Treemap, and radial layout to form the basic components.
- (D5) The visual representation should allow users to interactively manipulate check-in data analysis results for refinement and further exploration of interesting patterns. Therefore our system provides a set of enriched interactions such as filtering on spatial and temporal exploration, density display, zoom and pan, activity- based query in different geographic spaces, grouping and highlights.
3.4. Visual Encoding
- Size. The size of the Voronoi cells represents one kind check-in activity within one kind of activity.
- Clock layout. Then time distribution of the users’ check-in records and activities in the region are encoded along the fingerprints outside the circular broads. We can choose different time scales for display and these displays can be swifted in real-time and smoothly. We provide a 24 h distribution, a week distribution, and a month distribution for the display.
- Color. The hue of color is used to represent a check-in activity (see Figure 1).
- Bar chart. Each bar chart slice encodes a type of activity. The length of the slice also encodes the number of records.
- Space distortion. In the center of the activity fingerprint, we display the whole city’s regions’ broads (or either the users can select the interested regions for further exploration). The size of each inside region encodes the number of check-ins for the selected activity in that region.
- Clock layout. The time distribution of the users’ check-in records and activities in the region are encoded along the fingerprint’s outside circular broads. We can choose different time scales for the displays and they can be swifted into real-time and smoothly. We provide 24 h distribution, a week distribution, and a month distribution for the display.
- Color. Color is used to represent the density of the check-in records (see Figure 2). Both the bar charts on the circle and the distorted regions in the center are colored according to their related check-in number.
- Bar chart. Each bar chart slice encodes one region if it has records of selected activities. The length of the slice also encodes the number of records.
- Voronoi cell. Inside this icon we use circles to encode all the regions visited by this user group from the records. The voronoi cells [35] inside the circles encode check-in activity type. The voronoi cell size encodes the related number of check-in records of that activity in the region.
- Clock layout. Time distribution of the users’ check-in records and activities in the region are encoded along the fingerprint’s outside circular broads. We can choose different time scales for display and these displays can be swifted in real-time and smoothly. We provide a 24 h distribution, a week distribution, and a month distribution for the display.
- Color. The hue of a color is used to represent check-in activity (see Figure 3).
- Bar chart. Each bar is formed by a set of slices of different lengths. One slice on a bar represents one type of activity and its length shows the related number of check-in records.
4. System Implementation
4.1. System Design and Overview
4.2. Data Model
- Data collection and storage.
- Data preprocessing designed to transform the data into understandable information; Extract related features according to actual concrete application tasks; Formulation of expected patterns with mathematic models.
- Enriching data with semantics and knowledge and to design proper layout encoding and rendering algorithms; Display adjustment based on hardware and to produce one or multiple visual structures on the screen.
- Interpretation in the context of the human perceptual and cognitive system and present the proposed visual fingerprints in a intuitive way to better facilitate the comparison and subset of data features exploration.
4.3. Interactions
5. Case Study and Interview
5.1. Case Studies
5.1.1. Case Study Setup
5.1.2. Case 1: Region-Based Fingerprinting
5.1.3. Case 2: Activity-Based Fingerprinting
5.1.4. Case 3: User-Group-Based Fingerprinting
5.1.5. Case 4: Exploration of College Students
5.2. Interview with Domain Experts
- Q1 What useful patterns can you observe?
- Q2 Can you conveniently find the problems in the mining algorithms and solve them?
5.2.1. Feedback of Interactive Visual Design
5.2.2. Applicability and Improvements
- User Modeling. The visualization tool helps experts directly uncover people’s spatial-temporal patterns from their check-in data. They can visually observe characteristics such as activity distributions of people along with time and locations. This helps to model users more accurately. For example, they had observed that people in Shanghai have dinner around 1 to 2 h earlier than people in Beijing. This implies that, for the benefit of user modeling, users’ dinnertimes are conditional on the users’ locations. Another important benefit is that experts can get new business models from our visualization tool. For example, the tool can easily show them the time and locations where different people most likely go for physical training. To utilize this information, experts can advise their terminal production group to add a new function in their instant messenger to help people make new friends around them when doing physical training (or more generally, new friends with similar interests).
- Inspecting the Results of Data Mining Algorithms
- -
- (a) Activity labeling algorithm. The experts use a rule-based method to label the activity implied in each check-in message. This method uses thousands of expert-defined rules. However, it is not possible for the experts to enumerate all situations that may fall under each activity, or to manually check each rule’s correctness. They have successfully identified abnormal activity patterns with the help of the visualization tool, and corrected our rule base accordingly. For example, the tool shows them that the popular check-in messages that are submitted at Shanghai Customs during working hours were labeled as “working” by their rules, forming a dense sparkle at the location. Upon inspection of those messages, their data engineer found that most were apparently posted by tourists passing through Shanghai Customs. He therefore corrected the original rule condition (which labels a message as “working”) from “people who are in the region of Shanghai Customs during working hours”, to “people who stay in the region of Shanghai Customs during working hours”. This example shows that the visualization tool is able to help the data mining experts continuously improve their activity labeling algorithm.
- -
- (b) Activity extraction algorithm. They use a pattern-based approach to extract “verb-noun” pairs as activity descriptions from check-in messages. The experts then use the visualization tool to study the distribution of activity descriptions in different locations. The tool makes any extraction errors made by their algorithm very obvious. For instance, they noticed two activity descriptions with very high frequencies at a restaurant: “eat (verb) haidi (noun)” and “lao (verb, in English can be explained as catch) hot pot (noun)”. In fact, the correct extraction should be “eat (verb) haidilao hot pot (noun)”. The tool helped experts identify and rectify a lot of such word segmentations and part-of-speech errors, allowing them to generate more accurate extractions.
6. Discussions and Evaluation
7. Conclusions
Supplementary Materials
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Liu, S.; Wang, S.; Zhu, F.; Zhang, J.; Krishnan, R. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 51–62.
- Dorling, D.; Barford, A.; Newman, M. WORLDMAPPER: The world as you’ve never seen it before. IEEE Trans. Vis. Comput. Graph. 2006, 12, 757–764. [Google Scholar] [CrossRef] [PubMed]
- Cho, E.; Myers, S.A.; Leskovec, J. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 1082–1090.
- Wang, D.; Pedreschi, D.; Song, C.; Giannotti, F.; Barabasi, A.L. Human mobility, social ties, and link prediction. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 1100–1108.
- Sadilek, A.; Kautz, H.; Bigham, J.P. Finding your friends and following them to where you are. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, Seattle, WA, USA, 8–12 February 2012; pp. 723–732.
- Backstrom, L.; Sun, E.; Marlow, C. Find me if you can: Improving geographical prediction with social and spatial proximity. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 61–70.
- Gordon, E.; e Silva, A.d.S. Net Locality: Why Location Matters in a Networked World; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Cranshaw, J.; Toch, E.; Hong, J.; Kittur, A.; Sadeh, N. Bridging the gap between physical location and online social networks. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing, Copenhagen, Denmark, 26–29 September 2010; pp. 119–128.
- Noulas, A.; Scellato, S.; Mascolo, C.; Pontil, M. Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks. Soc. Mob. Web 2011, 11, 2. [Google Scholar]
- Chang, J.; Sun, E. Location 3: How users share and respond to location-based data on social networking sites. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Catalonia, Spain, 17–21 July 2011; pp. 74–80.
- Lindqvist, J.; Cranshaw, J.; Wiese, J.; Hong, J.; Zimmerman, J. I’m the mayor of my house: Examining why people use foursquare-a social-driven location sharing application. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vancouver, BC, Canada, 7–12 May 2011; pp. 2409–2418.
- Cramer, H.; Rost, M.; Holmquist, L.E. Performing a check-in: Emerging practices, norms and ’conflicts’ in location-sharing using foursquare. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services, Stockholm, Sweden, 30 August–2 September 2011; pp. 57–66.
- Cranshaw, J.; Schwartz, R.; Hong, J.I.; Sadeh, N. The livehoods project: Utilizing social media to understand the dynamics of a city. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, Dublin, Ireland, 4–7 June 2012.
- Wang, M.; Qin, L.; Hu, Q. Data mining and visualization research of check-in data. In Proceedings of the 2012 20th International Conference on Geoinformatics (GEOINFORMATICS), Hong Kong, China, 15–17 June 2012; pp. 1–4.
- Kim, G.; Xing, E.P. Visualizing brand associations from web community photos. In Proceedings of the Seventh ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 623–632.
- Andrienko, G.; Andrienko, N.; Bak, P.; Keim, D.; Wrobel, S. Visual Analytics of Movement; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Kapler, T.; Wright, W. GeoTime information visualization. IEEE Symp. Inf. Vis. 2005, 4, 136–146. [Google Scholar] [CrossRef]
- Tominski, C.; Schumann, H.; Andrienko, G.; Andrienko, N. Stacking-based visualization of trajectory attribute data. IEEE Trans. Vis. Comput. Graph. 2012, 18, 2565–2574. [Google Scholar] [CrossRef] [PubMed]
- Ferreira, N.; Poco, J.; Vo, H.T.; Freire, J.; Silva, C.T. Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips. IEEE Trans. Vis. Comput. Graph. 2013, 19, 2149–2158. [Google Scholar] [CrossRef] [PubMed]
- Andrienko, G.; Andrienko, N.; Schumann, H.; Tominski, C. Visualization of Trajectory Attributes in Space–Time Cube and Trajectory Wall. In Cartography from Pole to Pole: Selected Contributions to the XXVIth International Conference of the ICA, Dresden 2013; Springer: Berlin/Heidelberg, Germany, 2014; pp. 157–163. [Google Scholar]
- Crnovrsanin, T.; Muelder, C.; Correa, C.; Ma, K.L. Proximity-based visualization of movement trace data. In Proceedings of the 2009 IEEE Symposium on Visual Analytics Science and Technology (VAST), Atlantic City, NJ, USA, 12–13 October 2009; pp. 11–18.
- Demšar, U.; Buchin, K.; Cagnacci, F.; Safi, K.; Speckmann, B.; Van de Weghe, N.; Weiskopf, D.; Weibel, R. Analysis and visualisation of movement: An interdisciplinary review. Mov. Ecol. 2015, 3, 5. [Google Scholar] [CrossRef] [PubMed]
- Andrienko, G.; Andrienko, N. Spatio-temporal aggregation for visual analysis of movements. In Proceedings of the 2008 IEEE Symposium on Visual Analytics Science and Technology (VAST), Columbus, OH, USA, 19–24 October 2008; pp. 51–58.
- Wang, Z.; Lu, M.; Yuan, X.; Zhang, J.; van de Wetering, H. Visual Traffic Jam Analysis Based on Trajectory Data. IEEE Trans. Vis. Comput. Graph. 2013, 19, 2159–2168. [Google Scholar] [CrossRef] [PubMed]
- Lanir, J.; Bak, P.; Kuflik, T. Visualizing Proximity-Based Spatiotemporal Behavior of Museum Visitors Using Tangram Diagrams. Comput. Graph. Forum 2014, 33, 261–270. [Google Scholar] [CrossRef]
- Wu, W.; Zheng, Y.; Qu, H.; Chen, W.; Gröller, E.; Ni, L.M. BoundarySeer: Visual analysis of 2D boundary changes. In Proceedings of the 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), Paris, France, 25–31 October 2014; pp. 143–152.
- Guo, D.; Zhu, X. Origin-Destination Flow Data Smoothing and Mapping. IEEE Trans. Vis. Comput. Graph. 2014, 20, 2043–2052. [Google Scholar] [CrossRef] [PubMed]
- Scheepens, R.; Willems, N.; van de Wetering, H.; Van Wijk, J.J. Interactive visualization of multivariate trajectory data with density maps. In Proceedings of the 2011 IEEE Pacific Visualization Symposium (PacificVis), Hong Kong, China, 1–4 March 2011; pp. 147–154.
- Andrienko, G.; Andrienko, N.; Bremm, S.; Schreck, T.; Von Landesberger, T.; Bak, P.; Keim, D. Space-in-Time and Time-in-Space Self-Organizing Maps for Exploring Spatiotemporal Patterns. Comput. Graph. Forum 2010, 29, 913–922. [Google Scholar] [CrossRef]
- Andrienko, G.; Andrienko, N. A general framework for using aggregation in visual exploration of movement data. Cartogr. J. 2010, 47, 22–40. [Google Scholar] [CrossRef]
- Zeng, W.; Fu, C.W.; Arisona, S.M.; Qu, H. Visualizing Interchange Patterns in Massive Movement Data. Comput. Graph. Forum 2013, 32, 271–280. [Google Scholar] [CrossRef]
- Andrienko, N.; Andrienko, G.; Barrett, L.; Dostie, M.; Henzi, P. Space Transformation for Understanding Group Movement. IEEE Trans. Vis. Comput. Graph. 2013, 19, 2169–2178. [Google Scholar] [CrossRef] [PubMed]
- Havre, S.; Hetzler, B.; Nowell, L. ThemeRiver: Visualizing theme changes over time. In Proceedings of the 2000 IEEE Symposium on Information Visualization (INFOVIS), Salt Lake City, UT, USA, 9–10 October 2000; pp. 115–123.
- Balzer, M.; Deussen, O. Voronoi treemaps. In Proceedings of the 2005 IEEE Symposium on Information Visualization (INFOVIS), Minneapolis, MN, USA, 23–25 October 2005; pp. 49–56.
- Du, Q.; Faber, V.; Gunzburger, M. Centroidal Voronoi Tessellations: Applications and Algorithms. SIAM Rev. 1999, 41, 637–676. [Google Scholar] [CrossRef]
- Kaser, O.; Lemire, D. Tag-cloud drawing: Algorithms for cloud visualization. arXiv, 2007; arXiv:cs/0703109. Available online: https://arxiv.org/abs/cs/0703109(accessed on 13 December 2016). [Google Scholar]
- Seifert, C.; Kump, B.; Kienreich, W.; Granitzer, G.; Granitzer, M. On the beauty and usability of tag clouds. In Proceedings of the 2008 12th IEEE International Conference Information Visualisation (INFOVIS), London, UK, 9–11 July 2008; pp. 17–25.
- Feinberg, J. Wordle-Beautiful Word Clouds. Available online: http://www.wordle.net (accessed on 16 December 2016).
- Pu, J.; Liu, S.; Qu, H.; Ni, L. Visual fingerprinting: A new visual mining approach for large-scale spatio-temporal evolving data. In Advanced Data Mining and Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 502–515. [Google Scholar]
- Pu, J.; Liu, S.; Ding, Y.; Qu, H.; Ni, L. T-Watcher: A new visual analytic system for effective traffic surveillance. In Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management (MDM), Milan, Italy, 3–6 June 2013; Volume 1, pp. 127–136.
- Heer, J.; Card, S.K.; Landay, J.A. Prefuse: A Toolkit for Interactive Information Visualization. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Portland, OR, USA, 2–7 April 2005; pp. 421–430.
- Cao, N.; Gotz, D.; Sun, J.; Qu, H. Dicon: Interactive visual analysis of multidimensional clusters. IEEE Trans. Vis. Comput. Graph. 2011, 17, 2581–2590. [Google Scholar] [PubMed]
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pu, J.; Teng, Z.; Gong, R.; Wen, C.; Xu, Y. Sci-Fin: Visual Mining Spatial and Temporal Behavior Features from Social Media. Sensors 2016, 16, 2194. https://doi.org/10.3390/s16122194
Pu J, Teng Z, Gong R, Wen C, Xu Y. Sci-Fin: Visual Mining Spatial and Temporal Behavior Features from Social Media. Sensors. 2016; 16(12):2194. https://doi.org/10.3390/s16122194
Chicago/Turabian StylePu, Jiansu, Zhiyao Teng, Rui Gong, Changjiang Wen, and Yang Xu. 2016. "Sci-Fin: Visual Mining Spatial and Temporal Behavior Features from Social Media" Sensors 16, no. 12: 2194. https://doi.org/10.3390/s16122194