Mining treasure in user data

Data mining is related to machine learning, information retrieval, statistics, databases, and even data visualization. As Geoffrey Barbier,and Huan Liu quoted，data mining is defined as:

. . .”data processing using sophisticated data search capabilities and statistical algorithms to discover patterns and correlations in large preexisting databases; a way to discover new meaning in data” [1]

Similar to traditional miners exploit rarity metals from earth and ore, data miners seek to extract meaningful information from data sets that are not readily apparent and not always easy to obtain. However, users generate the data. Unlike metals in the earth, users are alive. This opens the opportunity for data miners to ask help from user to get more accurate data. In many case, data mining is an interaction process between users and system [3].

This post focusses on the structure of personal user information data mining systems. In September 2006, Microsoft Corporation submitted an application of Personal data mining. The patent was issued in Feb 2010. In this patent, Microsoft presents a detailed structure of a personal user information data mining system that serves as a good example.

First, we need to have the overall understanding about the process of data mining. The main part of this process is handled by three components, which are Data Configuration component, Data Mining component and Application component.

As Microsoft clams in the patent, “…a component may be, but is not limited to being, a process running on a processor, a processor, an object, an instance, an executable, a thread of execution, a program, and/or a computer.”[2] In this post, we can abstract a component as a mechanism that could exist in different forms.

The data configuration component normalizes personal user data from a plurality of disparate taxonomies into a single taxonomy. The data mining component identifies one or more correlations from the normalized personal user data. Finally, the application component retrieves personal user data from user devices and stores it in the data repository, and provides human users with the identified correlations. It also applies an additional level of processing to data mining results to interpret the results and provide users with useful information. [2] For example, some music website users always use a function named “guess what I like”. This function plays the music based on analyzing personal data mining result.

Figure 1

In this part，the application component is a mechanism that supports user interaction with the data mining component and data repository. It is very flexible to apply this mechanism in an application or a website. In figure 2,the application component is settled in the server. In figure 3,it is settled in user’s device. The application component includes several components. Figure 4 shows the components it includes. Among them, Interaction component is a good example to show how data mining technique affect to social media.

Figure 2

Figure 3

Figure 4

The Interaction component includes a setup component, a user preference storage component, a viewer and a notification component, and a context component that keeps track of the current usage situation.

Figure 5

The setup component allows users to explicitly set, modify or delete their preferences, which are stored in the preference storage.

The view component presents data mining results to the user. The selection of what to show and how to present it is governed by the preferences located in the preference store. For instance, users can search information about food in text or picture format. If users inform the View component about their preferred format, the component does not need to analyze which format of data the user prefers. Users explicitly fix their preferred format make the inference task easier. Thus, the view component can pay more attention to monitor interaction with data and learn preferences to help users easily navigate content (not format) that is important to them [2]. This strategy is wide-range used in data mining domain.

The view component can pay more attention to monitor interaction with data and learn preferences to help users easily navigate content (not format) that is important to them [2].

The notification component notifies users of the results from data mining. Again, users can set preferences about how notifications should be delivered and if some kinds of notifications should be suppressed. The notification component also can be assisted by rule-based logic and/or machine learning mechanisms to determine if and how to notify a user [2].

The context component can affect both the view component and the notification component. For example, consider a user that drives home at 6 pm everyday, and stores a preference in the preference store to not be disturbed by notifications while driving. If the notification component needs to send a notification around this time, it can be assisted by the context component in finding a suitable time for delivery. For example, the context may be able to detect when the user’s car is stopped. [2]

To summarize, these components describe a typical process of user-assisted data mining and provides a picture of how users interact with data mining systems.

reference：

1. Geoffrey Barbier,Huan Liu(2011): DATA MINING IN SOCIAL MEDIA. Social Network Data Analytics. Springer Science+Business Media

2. MICROSOFT CORPORATION(2011): PERSONAL DATA MINING.US 20080082467 A1,USPTO

3. Lei Tang Huan Liu(2010): Community Detection andMining in Social Media.SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY,Morgan publishers.