Abstract

There are two important models for data analysis and knowledge system: data cube lattices and concept lattices. They both essentially have lattice structures, which are actually irregular in our real world. However, their structural characteristics and relationship are not yet clear. To the best of our knowledge, no work has paid enough attention to this challenging issue from the perspective of graph data, in spite of the importance of structures in lattice data. In this paper, we first tackle the structural statistics of lattice data from three aspects: the degree distribution, clustering coefficient, and average path length. We demonstrated by various datasets that data cube lattices and concept lattices share similarities underlying their topology, which are, in general, different from random networks and complex networks. Specifically, lattice data follow the Poisson distribution and have smaller clustering coefficient and greater average path length. We further discuss and explain these characteristics intrinsically by building the analytical model and the generating mechanism.

1. Introduction

The data cube (lattice) [1] proposed by Gray et al. is a core data model in data warehousing and online analytical processing (OLAP) [2]. It plays a more critical role in data online analysis and processing, especially in the era of big data. It allows data to be modeled and analyzed intuitively in the context of multiple analysis dimensions and plays a vital role in business intelligence. Based on data cubes, online analysis operations such as rolling up, drilling down, slicing, and rotation can be easily carried out. From a conceptual view, users can analysis the data along a dimension hierarchy to various coarse-grained levels so that a large amount of detailed data can be described in a more concise and summary fashion, which facilitates users to obtain the general view. On the contrary, they can also specialize data along a dimension hierarchy to the fine-grained data. According to the computational dependence relation of rolling up and drilling down, the data cube lattice is generated.

The concept lattice [3] proposed by German mathematician Wile in 1982 is an important model in formal concept analysis (FCA) and is considered as an essential facility for data analysis. Formally, the concept lattice takes the triple as its formal background, in which is the object set, is the attribute set, and is the binary relation between the object set and the attribute set. It derives the formal concept , where X is called the concept extent and B is called the concept intent; each concept in the concept set forms the concept lattice according to the partial order relation. The node of concept lattices denotes both the intent and the extent of the concept, and the relationship between nodes reflects the generalization and specialization of concepts. According to the dependency of the knowledge body in the intent and the extent, a concept hierarchy model is established. Concept lattices are widely used in machine learning, pattern recognition, expert systems, decision analysis, data mining, information retrieval, computer networks and software engineering, and many other fields [4, 5]. For example, most data mining tasks can generate a large number of concepts. The lattice structure as the organizational form of the concept has many advantages in knowledge discovery. It facilitates a deep understanding of the dependencies between different concepts selected from a data set.

Essentially, the instances of data cube lattices and concept lattices all belong to lattice structure data. As the core model, data cube lattices and concept lattices are widely used in data analysis. From a higher perspective, their algebraic structures are intrinsically lattices, which imply they share similarities or connections in external characteristics.

Nedjar et al. and Shi [6, 7] demonstrated their relationship in the generation mechanism. Nedjar et al. [6] pointed out that, in data warehousing and data mining, the frequent closed itemset searching, the concise representation of association rules, or data cube lattices can use the formal concept analysis for feature depicting, compress representation, and effective calculation. Furthermore, they put forward Agree Concept Lattice and Quotient Agree Lattice to point out their close relationship between them. Shi [7] made a thorough study of concept lattices and finds that concept lattices are closely related to data cube lattices. They are all based on partial order relations. When base tables are used as formal backgrounds, the covering equivalence class of the data cube lattice and the equivalent feature sets based on formal concept analysis theory have the same partition results, and then the Aggregate Concept, Aggregate Concept Lattice (ACL), and Reductive Aggregate Concept (RAC) were proposed.

The structural analysis of these two crucial and representative models in data analysis under a unified framework will facilitate the discovery of their essence and design more generalized algorithms. Strogatz [8] discussed the reason why network anatomy is so important to characterize is that structure always affects function. Liu et al. and Zhai et al. [9, 10] mentioned the application of degree, distance, and topological statistics in some real networks. Lattice topology characteristics like the degree distribution of nodes affect the communication overhead (virtually link numbers) across various nodes for splitting a lattice in the cloud environment. The clustering or partitioning of lattice structures from the graph data perspective may result in space compression. Until recently, however, very little attention has been devoted to the structural characteristics and the relationship between them.

To address the above challenges, this paper explores and studies the structural characteristics of data cube lattices and concept lattices in terms of the degree distribution, clustering coefficient, average path length through the experimental demonstration, and theoretical analysis. By the analysis of the graph structure, we can find whether the data cube lattice and concept lattice share similar structural characteristics, which are different from other networks like random networks or complex networks and why. Note that the lattices discussed in our paper are generated from the real life—they are not completely regular lattices where all the nodes have the same degrees, and a plot of the degree distribution contains a single sharp spike (delta distribution) [11]. As we discovered in Section 3, the nodes of the real-world lattices have much broader range of degrees, implying randomnesses are added.

Our main contributions are as follows:(i)To the best of our knowledge, we first study the graph structural characteristics (such as the degree distribution, clustering coefficient, and average path length) of data cube lattices and concept lattices, analyze, and demonstrate them based on various real datasets.(ii)We present the structure relationship between data cube lattices and concept lattices. Compared with random networks and complex networks, real-world lattices follow the Poisson distribution and have smaller clustering coefficient and greater average path length.(iii)We discuss the analytical model and the generation mechanism of data cube lattices and concept lattices.

This paper is structured as follows. Section 2 introduces the model of data cube lattices and concept lattices and relevant definitions. Section 3 presents the structural characteristics of data cube lattices and concept lattices through the experimental study and validates the similarity between data cube lattices and concept lattices on structural characteristics. Section 4 discusses the analytical model and the generation mechanism of data cube lattices and concept lattices. The last section concludes this paper and describes some future works.

2. Preliminary

2.1. Data Cube Lattices

Data cube lattices generalize group-by operators and aggregate each combination of group-by attributes. Among them, the attributes grouped are called dimensions D, and the attributes aggregated are called measurements ; each grouped attribute combination is called cuboid (also called view). Correspondingly, the cuboid containing dimensions called is i-dimension cuboid. The data cell is a tuple in the cuboid, where is the ith dimension attribute value.

For a d-dimensional data cube lattice, group-by views (or cuboids) are generated since each combination of group-by attributes is computed.

Definition 1. (partial order relation). If , , then u, have partial order relation, denoted by or . If and , then it is denoted as or . We can say that u generalizes , or specializes u. In other words, u drills down to or rolls up to u. If , , then .

Definition 2. (data cube lattice). A data cube is aggregated from a base table of data warehousing on various combinations of dimension attributes. It contains the data cells. Let be the data cells. The partial order relation : induces the data cube lattice structure.

Definition 3. (base tuple set). Given a data cell , the base tuple set of c, , i.e., the set of all base table tuples that roll up to c.

Definition 4. (covering equivalence). Suppose that , and is defined as the covering equivalence relation. The covered equivalence classes is the sets .

Definition 5. (upper and lower bounds). Let be a partially ordered set. Any element a in A is called an upper bound (lower bound) of the subset M of A if for any element m in M, there is .

Example 1. Table 1 is a base table of product sales, which has three dimension attributes of Product, Time, and Store and one measurement attribute, Sales. The data cube lattice generates eight group-by views by aggregation, which form the cube lattice structure based on the relationship (as shown by the connection between nodes) between rolling up (generalization) and drilling down (specialization).
Figure 1 is a data cube lattice generated by aggregation of the Sum operation on sales in Table 1. Note that denotes the dimension attribute value ALL. The data cells , , and have the semantic relations of rolling up and drilling down each other. The partial order relationship between them forms a data cube lattice.

2.2. Concept Lattices

Definition 6. (concept lattice). Let I be a binary relationship between U and A, where U is the object set and A is the attribute set. Given that , when , we say the object x owns the attribute y, and the triple tuple was called formal background. On a triple tuple , we take a subset of U and a subset of A and get a concept set under relation I: ; among them, A is the extent of L and B is the intent of L. We define a partial ordering relation on L: (or ). And then, is a complete Galois lattice, called a concept lattice about formal background .

Definition 7. (equivalent feature set). Let be a formal background, , . If and are satisfied, then M is an equivalent feature set of N, denoted by .

Definition 8. (Hasse diagram). A Hasse diagram is a kind of mathematical diagram used to represent a finite partially ordered set in the form of a drawing of its transitive reduction. Concretely, for a partially ordered set , one represents each element of S as a vertex in the plane and draws a line segment or curve that goes upward from to whenever covers (that is, whenever and there is no such that ). These curves may cross each other but must not touch any vertices other than their endpoints. Such a diagram, with labeled vertices, uniquely determines its partial order.
According to the definition of Concept Lattice, the concept lattice generated by a formal background given in Table 2 is shown in Figure 2.
Figure 2 shows a Hasse diagram representation of concept lattices corresponding to the formal background of Table 2. Each node in the graph represents a concept, each concept is identified by its extent and intent, and the order relationship between concepts is represented by the edges between nodes. Among them, the concept with the largest extent (corresponding to the smallest intent) in the concept lattice is the largest concept in the concept lattice, which is located at the top of the concept lattice; the concept with the largest intent (corresponding to the smallest extent) in the concept lattice is the smallest concept in the concept lattice, which is located at the bottom of the concept lattice.

3. Lattice Structure Characteristics

Since the three spectacular concepts—the degree distribution, clustering coefficient, and average path length—play a more important role in networks than other quantities and measures [11], we verify them on various real datasets for lattices, particularly the main representations—data cube lattices and concept lattices.

3.1. Data Cube Lattices’ Structural Characteristics

We used the two classic datasets Foodmart and Weather (http://cdiac.esd.oml.gov/cdiac/ndps/ndp026b.html) for the data cube lattices. Then, we calculated its topology statistics to analyze whether the data cube lattices generated by different data sets have similar structural characteristics.

We randomly extracted 10,000 tuples from Foodmart and generated a data cube lattice Foodmart-1w by adapting the data cube construction algorithm [12]. Then, we randomly extracted 10,000 tuples from Weather data and generated the data cube lattice Weather-1w by the same lattice structure algorithm. They are shown in Table 3.

For the two datasets in Table 3, the structural characteristics of the degree distribution, clustering coefficient, and average path length are calculated.

Figure 3(a) shows the degree distribution of Foodmart-1w and Weather-1w. The horizontal axis represents the degree value of the node, while the vertical axis represents the total number of nodes when the degree is that value. By comparison, it can be found that the two curves in the graph first jump sharply and then decrease exponentially. The average degree of each node of Foodmart-1w is 8.8; the average degree of each node of Weather-1w is 7.9. It can be seen that the average of data cube lattices of different data sets is not very different.

Figure 3(b) shows the clustering coefficient of two data sets in the data cube lattice. The horizontal axis represents the degree value of the node, while the vertical axis represents the average clustering coefficient of nodes when the degree is that value. The average clustering coefficient of Foodmart-1w and Weather-1w is 0.0231 and 0.0042, respectively. All of them have relatively small average clustering coefficients.

Figure 3(c) shows the average path length distribution of two data sets in the data cube lattice. The horizontal axis represents the length of the path (hops), and the vertical axis represents the number of pairs of nodes when the path length is that value. The average path length of Foodmart-1w and Weather-1w is 5.25 and 7.21, respectively. Figure 3(c) shows the distribution of the average path length is similar between two structures.

3.2. Concept Lattices’ Structural Characteristics

For the concept lattices, we used the Mushroom data in the UCI machine learning library (http://archive.ics.uci.edu/ml/datasets), which is the common benchmark dataset in calibrating various concept lattice algorithms. We also analyzed whether the concept lattices generated by different data sets have similar structural characteristics.

We randomly selected 600 tuples from the Mushroom data set of UCI and divided them into two parts, each with 300 tuples. Then, we used In-Close algorithm [13] to generate concept lattices Mushroom-1 and Mushroom-2 and used FcaStone (http://fcastone.sourceforge.net) to transform the concept lattices into the graph structures further, forming the second group dataset, as shown in Table 4.

The degree distribution, clustering coefficient, average path length, and other structural characteristics are calculated by using the data in Table 4. The results are shown in Figures 4(a)4(c).

Figure 4(a) shows the degree distribution of Mushroom-1 and Mushroom-2. By comparison, it is an interesting discovery that the two curves in the graph jump sharply first and then decrease exponentially. The average degree of each node of Mushroom-1 is 8.1, while that of Mushroom-2 is 7.9. It can be seen that the average degree of concept lattices of different data sets is not very different.

Figure 4(b) shows the clustering coefficient of the two data sets in the concept lattice, in which the horizontal axis represents the degree value of the node, and the vertical axis represents the average clustering coefficient of nodes when the degree is the value. The average clustering coefficient of Mushroom-1 and Mushroom-2 is 0.1064 and 0.0842, respectively. They all have relatively small average clustering coefficient.

Figure 4(c) shows the average path length distribution of the two data sets in the concept lattice, in which the horizontal axis represents the path length (hops), and the vertical axis represents the number of node pairs when path length is that value. The average path length of Mushroom-1 and Mushroom-2 is 6.14 and 5.15, respectively. Both of them have smaller average path length.

3.3. Relationship between Data Cube Lattices and Concept Lattices

We used the data cube lattice generated by Foodmart data in the first group and the concept lattice generated by Mushroom data in the second group, as shown in Table 5, to analyze whether they share the similar structural characteristics.

For the two kinds of lattice structure data in Table 5, the degree distribution, clustering coefficient, and average path length are calculated. The data cube lattice and concept lattice are abbreviated as CubeLattice and ConceptLattice, respectively.

Figure 5(a) shows the degree distribution of two different lattice structure data, CubeLattice and ConceptLattice. The degree distributions of both of them rise sharply, reach a peak value, and then decay exponentially. In the degree distribution of CubeLattice, when the degree of a node is 5, the number of nodes reaches a peak value, which is 5,450; in the degree distribution of ConceptLattice, when the degree of a node is 7, the number of nodes reaches a peak value, which is 2,440. By comparison, we have found that the degree distribution of CubeLattice is similar to that of ConceptLattice.

Figure 5(b) shows the distribution of the clustering coefficient for CubeLattice and ConceptLattice. From Figure 5(b), we can see that both of them rise sharply at first, then decline slowly after a peak. Although there are fluctuations in amplitude during the decline process, the overall trend is downward. The average clustering coefficient of CubeLattice and ConceptLattice is 0.0231 and 0.1064, respectively. So, the average clustering coefficient of lattice structure data is relatively small.

Figure 5(c) shows the distribution of the average path length for the two different lattice structures, CubeLattice and ConceptLattice. By comparison, it has been found that the average path length distribution of them increases slowly at first, reaches a peak value, and then decreases slowly. After calculation, we found that the average path length of CubeLattice is 5.25 and that of ConceptLattice is 6.14. Obviously, the distribution of the average path length about CubeLattice is similar to that of ConceptLattice.

3.4. Comparison with Other Main Networks

In order to check whether the lattices’ structural statistics are different with other main networks such as random networks and complex networks [8, 11], we first extracted the 10,000 tuples from Foodmart and generated a data cube lattice. Then, we generated a random graph based on the ER model [14] using SNAP [15]. For the complex network, the data set is a real social network Facebook collected in SNAP data sets.

The comparison results are as follows:(1)The degree distribution of lattices is obviously different from that of Facebook but shares certain similarity with the ER random graph. The degree distribution of Facebook generally follows the heavy tail distribution, which indicates the occurrence of nodes with a much higher degree than most other nodes. On the contrary, Lattice and ER follow the Poisson-like degree distribution (Figure 6).(2)The average clustering coefficient of lattices is different from that of complex networks. According to the calculation, the average clustering coefficient of Lattice is 0.0231, and the average clustering coefficient of ER is 0.0011. They both have a small clustering coefficient. The clustering coefficient in ER random graphs is evenly and randomly distributed. Different from ER, the nodes with a smaller degree in Lattice have the larger clustering coefficient in lattices. The average clustering coefficient of Facebook is 0.5225, which accords with the large clustering coefficient of the small-world network. That is also different from the average clustering coefficient of Lattice (Figure 7).(3)The average path length of Lattice and ER is 7.21 and 4.26, respectively. Both of them have smaller average path length. The longest path between two points in the lattice structure is related to the number of dimensions, while the longest path between two nodes in the ER random graph is related to the number of nodes. Facebook’s average path length is 3.7, and the longest path between two points in the network is 6. Obviously, the average path length distribution of the lattice structure is different from that of the Facebook social network (Figure 8).

In summary, on the degree distribution, lattices, and random networks all follow the Poisson distribution. Besides, lattices and random networks have the smaller clustering coefficient. But, lattices have the rather larger average path length, while random networks are moderately large in the average path length. For the typical complex networks, they have the larger average clustering coefficient and the smaller average path length, and their degree distribution satisfies the heavy tail distribution.

4. Some Discussions

4.1. Analytical Model

Data cube lattices have different structural characteristics from random networks and complex networks. First, in the process of generating lattice-structured data, nodes do not connect according to a random probability, but there are partial ordering relationships among some different nodes, which generate edges according to the partial ordering relationship; secondly, there is no partial order relationship between data cells in the same layer, that is, the lattice structure data have a clear hierarchical structure; finally, the lattice structure has its unique regular structure, and every two nodes have upper and lower bounds.

We elaborate the analytical model of data cube lattices and concept lattices from three aspects.

4.1.1. Degree Distribution

From the above experiments, we can see that the degree distribution curves of the lattice structure data first jump sharply, then drop exponentially, which tends to be similar to the Poisson distribution curve. In order to prove this point, we use polynomial distribution and Poisson distribution curves to fit the degree distribution of the lattice structure data.

Firstly, the degree distribution curve of the data cube lattice (it has 10,000 base tuples, 15 dimensions, 13,230 nodes, and 53,424 edges.) is fitted by the polynomial curve. In the process of fitting, when the polynomial is 6th power and more, the R-square reaches the maximum value. As shown in Figure 9, the vertical axis represents the probability of the occurrence of nodes with different degree values, the diamond points represent the discrete points of the degree distribution of the data cube lattice, and the triangle curves represent the fitting polynomial curves of 6th power. At this time, the value of R-square is about 0.89, which fits the original degree distribution curve better.

In order to evaluate the accuracy of curve fitting, the square of the error between the discrete point of the degree distribution curve and the discrete point of the polynomial curve is calculated. The result is about 0.0084.

Then, the Poisson distribution curve is fitted to the degree distribution curve, and the probability expression of Poisson distribution is as follows:where ϑ denotes the expected value. The degree values of the nodes in the data cube lattice are multiplied by their corresponding probabilities, and then the sum can be calculated as follows: . Then, the Poisson distribution curve can be drawn, as shown in Figure 10. The vertical axis represents the probability of the occurrence of nodes with different degree values, the curve with diamonds represents the curve simulating Poisson distribution, the curve with squares represents the degree distribution curve of the whole lattice structure data, and the curve with triangles is the degree distribution curve of the nodes in the eleventh layer of the lattice structure. It can be seen that both the degree distribution of the whole structure and the degree part of one of the layers tightly fit the Poisson distribution curve. Also, another data cube lattice (it has 10,000 base tuples, 12,088 nodes, and 54,658 edges.) is also fitted with the Poisson distribution curve. As shown in Figure 11, the degree distribution of different lattice structure data is tightly fitted with the Poisson distribution curve.

The square of the errors of the discrete points of the degree distribution curve and the Poisson distribution is calculated. The result is about 0.0038, and the error is much smaller than the fitting curve of the polynomial. Combined with the additivity of Poisson distribution, it is further confirmed that the degree distribution of the lattice structure is similar to Poisson distribution.

4.1.2. Clustering Coefficient

If a node i is chosen arbitrarily in the lattice, and the degree of i is , then the number of possible connected edges of the adjacent nodes of node i is . If the actual number of edges between the adjacent nodes is r, the clustering coefficient of node i is as follows:

The average clustering coefficient of all nodes in the whole lattice structure is as follows:where N denotes the total number of nodes in the lattice structure.

It is assumed that the node i is connected with the node j and the node k in the same layer, but there is no edge between j and k, so the value of r is rather smaller.

Fewer nodes are connected across layers, so there are fewer edges between the nodes connected with the same node.

In concept lattices, there are few nodes connected across layers and few edges between the nodes connected with the same node. Thus, the clustering coefficient of concept lattices in the graph structure is also very small.

Therefore, combining equations 2 and 3, we can see that data cube lattice and concept lattice have smaller clustering coefficients.

4.1.3. Average Path Length

If the dimension number of nodes in the lattice structure is h, the number of layers is also h since it rolls up from the bottom cuboid to the top cuboid. Taking any two nodes i and j, the minimum number of edges to pass from i to j is called the shortest distance from i to j and is denoted as . The average distance of all pairs of nodes in the data cube lattice is as follows:where M is the sum-up node of the data cube lattice, and there must be . Therefore, the data cube lattice has a relatively small average path length.

For concept lattices, the following definitions are given to facilitate the analysis of their graph structure model:

The Hasse diagram of an n-layer concept lattice consists of a triple , where N is the set of nodes, E is the set of edges, and m is the number of layers of the concept lattice graph. Therefore, in this hierarchical structure, there are the following properties:(1)N consists of m nonempty subsets, and , where , is the ith layer in the graph(2)If the edge , then and , then is the span of edge , that is, the distance between the node x and the node j

Let be the distance from the node x to the node y; we have . Combining equation 4, it can be deduced that the graph structure of the concept lattice also has a relatively small average path length.

4.2. Generation Mechanism

According to the definition of data cube lattices in Definition 2 and the definition of concept lattices in Definition 6, data cube lattices are derived from base tables in the data warehouse, and concept lattices are established based on formal background. When the base table is stored in the relational model, it describes the relationship between attributes and tuples, and formal background describes the relationship between attributes and objects. Attributes Product, Time, and Store in Figure 1 can be mapped to attributes a, b, and c in Figure 2, and the tuple “1” in Figure 1 can be mapped to the object “1” in Figure 2, and so on. By this, attributes in the base table can correspond to attributes and objects in the formal background one by one. Therefore, the structure of the base table in the data cube lattice and the formal background in the concept lattice are the same.

According to [16], when the base table in the data warehouse is taken as the formal background, there is one-to-one correspondence between the covering equivalence class in the data cube lattice and the equivalent feature set in the concept lattice. They both have the same covering tuple set, and the upper bound of each covering equivalence class corresponds to the concept contained in the equivalent feature set. Combined with the definition of equivalent feature sets in Definition 7, the following theorem holds in [16].

Theorem 1. When the base table is taken as the formal background, i.e., , where U corresponds to the tuple set of the base table, A corresponds to the dimension attribute set of the base table (without measurement attributes). Let (, N″) be the concept of the corresponding N, where Cube is a data cube lattice derived from the base table, and c is a data cell. If is satisfied, then and the upper bound u of is the corresponding conceptual intent of N.
Based on Theorem 1, we have the following corollary:

Corollary 1. Let and , then the corresponding data cube lattice is equivalent to the concept lattice , denoted by .

For example, let , then the data cell c may take any value of , , and in Figure 1. The corresponding concept of L is , and is not only the upper bound of but also the intent of the concept . Corollary 1 proves that if the base table of the data warehouse is taken as the formal background, the concept lattice derived from the base table is the same as the structure of the data cube lattice which only preserves the upper bounds in equivalence classes (as shown in Figure 12).

In order to improve the performance of online analysis and decision-making of data warehousing, Gray et al. proposed the data cube operator CUBE [1], which generalizes group-by, cross-tab and subtotal operators, and preaggregates and materializes the attributes (i.e., dimensions) of group-bys.

Laks et al. [12] proposed quotient cubes to compress data cubes efficiently. It uses cover partition to partition data cells with the same upper bound into classes to preserve the semantics of the data cube. The upper and lower bounds of each class are preserved to achieve data cube compression. The closed data cube is proposed in [17], which also implements compression of data cubes through equivalent classes. The difference is that the closed data cube only keeps its upper bound for each class, so it is more efficient to compress the data cube.

In 1982, German mathematician Wile first proposed the theory of formal concept analysis based on concepts and conceptual levels. The core data model is the concept lattice, also known as Galois lattice, which is used to discover, sort, and display concepts [3]. At present, the construction methods of concept lattices are divided into the batch processing algorithm [18] and the progressive algorithm [13]. Zhang et al. [19] studied how to quickly and effectively adjust the original concept lattice to get the concept lattice of the new formal background after some attribute reduction of the formal concept, rather than the reconstruction algorithm in the traditional way. Sarmah et al. [20] proposed an incremental algorithm to reduce multiple attributes. Compared with the incremental algorithm to reduce single attributes, the algorithm only needs to be executed once. With the further improvement and development of the concept lattice theory and methods, the fields of fuzzy theory, spatial clustering, granular computing, and other fields have been intersected and integrated with formal concept analysis and concept lattices, resulting in new applications [16, 21].

Until now, none of the above work studies the structural characteristics of data cube lattices and concept lattices in view of the importance of structures.

6. Conclusion and Future Work

This paper analyzes and demonstrates the structural characteristics of data cube lattices and concept lattices based on the synthetic and real data sets. We found the similarities between them in the degree distribution, clustering coefficient, and average path length. We further discuss their similarity of the analytical model and the generation mechanism in the intrinsic perspective.

Our results demonstrate initial promise for exploring the structural characteristics of data cube lattices and concept lattices in data analysis; however, there are many directions of future research. Next, we will investigate if data cube lattices and concept lattices can be unified under the same framework since they are externally and intrinsically similar. Thus, some efficient algorithms such as construction, reduction, or query in data cube lattices or concept lattices could be applied to each other or be generalized. Besides, we intend on utilizing the lattice structural characteristics such as the degree distribution to facilitate the lattice structure data partitioning in the distributed cloud environment as the structure is the key factor in the load balancing and the communication cost minimizing.

Data Availability

The data used to support the findings of this study are available.

Conflicts of Interest

There are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors are grateful to Yang Wang for his early work in part related to this paper when he is a graduate in our lab. This work was partially supported by the Natural Science Foundation of China (61462050 and 61562054) and the Natural Science Foundation of Yunnan Province (KKSY201603016).