Wavelet-Based Clustering for Very Large Multidimensional Datasets Gholam Sheikholeslami, Dantong Yu, Surojit Chatterjee and Aidong Zhang Department of

Størrelse: px
Starte visningen fra side:

Download "Wavelet-Based Clustering for Very Large Multidimensional Datasets Gholam Sheikholeslami, Dantong Yu, Surojit Chatterjee and Aidong Zhang Department of"

Transkript

1 Wavelet-Based Clustering for Very Large Multidimensional Datasets Gholam Sheikholeslami, Dantong Yu, Surojit Chatterjee and Aidong Zhang Department of Computer Science and Engineering State University of New York at Bualo Bualo, NY 14260, USA Abstract Clustering large multidimensional datasets is an important problem which tries to nd the densely populated regions in the data space to be used in data mining, knowledge discovery, or ecient information retrieval. A good clustering approach should be ecient and detect clusters of arbitrary shape. It must be insensitive to the noise (outliers) and the order of input data. In this article, we introduce a novel clustering approach based on wavelet transform which satises all the above requirements. Using multi-resolution property of wavelet transform, we can eectively identify arbitrary shaped clusters at dierent degrees of detail. We demonstrate that wavelet-based clustering can be eciently applied to both low-dimensional and high-dimensional datasets. 1 Introduction There are many databases such as nancial, crystallography and corporate databases, where very large multi-dimensional datasets with numerical attributes exist. Clustering in data mining is the discovery of interesting patterns that may exist in the underlying data. Because of the large size of these databases, a primary requirement for clustering algorithms for data mining is eciency. The clustering technique should be fast and scalable with the number of dimensions and the size of the input. Also, due to the diverse nature and characteristic of the source of the data, the clusters may assume arbitrary shapes. They may be nested within one another, may have holes inside, or may possess concave shapes. The problem of handling arbitrary shapes in high dimensions is particularly complex. A good clustering approach should be unaected by outliers (noise) and should detect This research is partially supported by an NSF CAREER grant IIS

2 them eectively. In addition, clustering algorithms should assume minimum domain knowledge, e.g., number of clusters or underlying probability distribution, and they should be insensitive to the order of the input data. Another desirable property for clustering algorithms is the ability to produce clusters at dierent levels of detail which is termed as multiresolution property. State of the Art One category of clustering methods is partitioning algorithms. Partitioning algorithms construct a partition of a database of N objects into a set of K clusters. Usually they start with an initial partition and then use an iterative control strategy to optimize an objective function. There are mainly two approaches i) k-means algorithm, where each cluster is represented by the center of gravity of the cluster, ii) k-medoid algorithm, where each cluster is represented by one of the objects of the cluster located near the center. Ng and Han introduced CLARANS (Clustering Large Applications based on RANdomaized Search) which is an improved k-medoid method [6]. This is the rst method that introduces clustering techniques into spatial data mining problems. The other category of clustering methods includes hierarchical algorithms which create a hierarchical decomposition of the database. The hierarchical decomposition can be represented as a dendrogram. The algorithm iteratively splits the database into smaller subsets until some termination condition is satised. Hierarchical algorithms do not need K as an input parameter, which is an obvious advantage over partitioning algorithms. The disadvantage is that the termination condition has to be specied. BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) uses a hierarchical data structure called CF-tree which is a height balanced tree that stores the clustering features [12]. BIRCH tries to produce the best clusters with the available resources. Ester et. al. [2] presented a clustering algorithm DBSCAN relying on a density-based notion of clusters. It is designed to discover clusters of arbitrary shapes. The key idea in DBSCAN is that for each point of a cluster, the neighborhood of a given radius has to contain at least a minimum number of points. DBSCAN can separate the noise (outliers) and discover clusters of arbitrary shapes. CURE (Clustering Using Representatives) utilizes multiple representative points for each cluster that are generated by selecting well scattered points from the cluster and then shrinking them toward the center of the cluster by a specied fraction [3]. Recently a number of algorithms were presented which quantize the space into a nite number of cells and then do all operations on the quantized space. The main characteristic of these approaches is their fast processing time which is typically independent of the number of data objects. They depend only on the number of cells in each dimension in the quantized space. Wang et al proposed 2

3 a STatistical INformation Grid-based method (STING) for spatial data mining [10]. They divide the spatial area into rectangular cells using a hierarchical structure. They store the statistical parameters of each numerical attribute of the objects within cells. In STING, the hierarchical representation of grid cells is used to search for queries or assign a new object to the clusters. CLIQUE clustering algorithm identies dense clusters in subspaces of maximum dimensionality [1]. It argues that nding the clusters in the subspaces of the original space is more eective, because many dimensions can have noise or uniformly distributed values. Also in high dimensional spaces, the average density of points anywhere in the data space is likely to be quite low. CLIQUE partitions the data space into cells. To approximate the density of the data points, it counts the number of points in each cell. The clusters are unions of connected high density cells within a subspace. It generates cluster descriptions in the form of DNF expressions. Note that while CLIQUE focuses on nding clusters embedded in subspaces of high dimensional data, our proposed approach, similar to BIRCH and DBSCAN, is designed to detect clusters in full dimensional space. Hence, in that regard the scope and goal of our method are dierent from those of CLIQUE. We rst introduced WaveCluster which partitions the data space into cells and applies wavelet transform on them [8]. Using multiresolution property of wavelets, WaveCluster can detect arbitrary shape clusters at dierent degrees of detail. Though this approach meets all the desirable properties of a good clustering technique, its performance in terms of required memory and time degrades as the dimensionality of the data increases. Thus, it requires modications to better handle high dimensional datasets. In this article, we generalize the concept of wavelet-based clustering approach and provide approaches to demonstrate its usefulness in both low-dimensional and high-dimensional data spaces. 2 Wavelet-Based Clustering We rst discuss the relationship between multidimensional data and multidimensional signals and show how to use wavelet transform to detect the inherent relationships in the data. We propose to look at the multidimensional data space from a signal processing perspective. The collection of data in the multidimensional data space composes a d-dimensional signal. The high frequency parts of the signal correspond to the regions of the data space where there is a rapid change in the distribution of data, that is the boundaries of clusters. The low frequency parts of the d- dimensional signal which have high amplitude correspond to the areas of the data space where the data are concentrated. For example, Figure 1 shows a 2-dimensional data space, where the two dimensional data points have formed four clusters. Each row or column can be considered 3

4 as a one-dimensional signal, so the whole data space will be a 2-dimensional signal. Boundaries and edges of the clusters constitute the high frequency parts of this 2-dimensional signal, whereas the clusters themselves, correspond to the parts of the signal which have low frequency with high amplitude. When the number of data is high, we can apply signal processing techniques to nd the high frequency and low frequency parts of d-dimensional signal representing the data, resulting in detecting the clusters. The key idea is to apply signal processing methods to transform the space and nd the dense regions in the transformed space. Figure 1: A sample 2-dimensional data space. Wavelet transform is a signal processing technique that decomposes a signal into dierent frequency subbands (for example, high frequency subband and low frequency subband). It is a type of signal representation that can give the frequency content of the signal at a particular instant of time by ltering. A one-dimensional signal s can be ltered by convolving the lter coecients c k with the signal values: ^s i = M X?1 k=0 c k s i+k? M ; (1) 2 where M is the number of coecients in the lter and ^s is the result of convolution. Wavelet transform provides us with a set of interesting lters. For example, Figure 2 shows the Cohen- Daubechies-Feauveau(2,2) biorthogonal wavelet [9]. Figure 2: Cohen-Daubechies-Feauveau (2,2) biorthogonal wavelet. We now briey review wavelet-based multi-resolution decomposition. More details can be found in Mallat's paper [5]. To have multi-resolution representation of signals we can use discrete wavelet transform. We can compute a coarser approximation of the one-dimensional input signal S 0 by convolving it with the low pass lter H ~ and down sampling the signal by two [5]. All the discrete 4

5 approximations S j, 1 < j < J (J is the maximum possible scale), can thus be computed from S 0 by repeating this process. Figure 3 illustrates the method. ~ H 2 Sj S j ~ G 2 Dj S 0 ~ H ~ G 2 2 S1 D1 Figure 3: Block diagram of multi-resolution wavelet transform. D j denotes the dierence between S j and S j?1 and is called detail signal at the scale j. We can compute the detail signal D j by convolving S j?1 with the high pass lter G ~ and returning every other sample of output. The wavelet representation of a discrete signal S 0 can therefore be computed by successively decomposing S j into S j+1 and D j+1 for 0 j < J. This representation provides information about signal approximation and detail signals at dierent scales. We can easily generalize the wavelet model to d-dimensional data space in which one-dimensional transform can be applied multiple times. For example, in 2-dimensional data space, we can represent the data space as an image where each pixel of image corresponds to one cell in the data space. Wavelet transform can be applied along the axes x and y. It decomposes an image into an average signal (LL) and three detail signals which are directionally sensitive: LH emphasizes the horizontal image features, HL the vertical features, and HH the diagonal features. Figure 4 shows the wavelet representation of the image in Figure 1 at three scales. At each level, LL is shown in the upper left quadrant, LH is shown in the upper right quadrant, HL is displayed in the lower left quadrant, and HH is in the lower right quadrant. a) b) c) Figure 4: Multi-resolution wavelet representation at a) scale 1; b) scale 2; c) scale 3. 5

6 Useful Properties of Wavelet Transform in Clustering The motivation for using wavelet transform and thereby nding connected components in the transformed space is drawn from the following observations. Unsupervised Clustering: The hat-shape lters (such as the one shown in Figure 2) emphasize regions where points cluster, but simultaneously tend to suppress weaker information in their boundary. Intuitively, dense regions in the original space act as attractors for the nearby points and at the same time as inhibitors for the points that are not close enough. This means clusters in the data automatically stand out and clear regions around them, so that they become more distinct. It makes nding the connected components in the transformed space easier than that of the original space. Figure 5 shows an example of a data space before and after transform. This dataset contains 500,000 data in the two clusters plus 25,000 randomly distributed noise data. As the gure shows, the clusters in the transformed space are more salient and thus easier to be found. a) b) Figure 5: a) Original data space; b) Transformed space. Eective Removal of Noise: Noise data are the data that do not belong to any of the clusters and usually their presence causes problems for the current clustering methods. Applying wavelet transform removes the noise in the original space, resulting in more accurate clusters. As we will show, we take advantage of low-pass lters used in the wavelet transform to automatically remove the noise. Figure 5 shows that majority of the noise data in the original space are removed after the transformation. Multi-resolution: Multi-resolution property of wavelet transform can help detecting the clusters at dierent levels of detail. As we showed, wavelet transform provides multiple levels of decompositions which results in clusters at dierent scales from ne to coarse. The appropriate scale for choosing clusters can be decided based on the user's needs. 6

7 Cost Eciency: Since applying wavelet transform is very fast, it makes our approach costeective. As it will be shown later, clustering very large datasets takes only a few seconds. Using parallel processing we can get even faster responses. WaveCluster Algorithm Given a large set of data, the goal of the algorithm is to detect clusters and assign labels to them based on the cluster they belong to. The four main steps of WaveCluster algorithm are: (1) Quantize the data space: Since we use discrete wavelet transform, before applying the transform, the data space should be quantized. In quantization, each dimension A i in the d- dimensional data space is divided into m i intervals. If we assume that m i is equal to m for all the dimensions, there would be m d cells in the data space. Then the corresponding cell for the data will be determined based on their attribute values. For each cell we count the number of data contained in it to represent the aggregation of the data. The number (or size) of these cells and the aggregation information in each cell are important issues that aect the performance of clustering. Because of multi-resolution property of wavelet transform, we consider dierent cell sizes at dierent scales of transform. (2) Apply wavelet transform: Discrete wavelet transform is applied on the quantized data space. The d-dimensional space requires d-dimensional wavelet transform. Based on the representation of the data space, we have dierent implementations for wavelet transform. Applying wavelet transform on the cells results in a new data space and hence new cells. (3) Find the connected components at dierent scales: Given the set of new cells, WaveCluster then detects the connected components in the transformed data space. Each connected component is a set of cells in the transformed space and is considered as a cluster. Corresponding to each resolution r of wavelet transform, there would be a set of clusters C r, where usually at the coarser resolutions, number of clusters is less. Each cluster w, w 2 C r, will have a cluster number. (4) Map the data to clusters: WaveCluster labels the cells in each cluster in the transformed space with its cluster number. These clusters are in the transformed space and are based on wavelet coecients. Thus, they cannot be directly used to dene the clusters in the original space. WaveCluster makes a lookup table to map the cells in the transformed space to the cells in the original space. Each entry in the table species the relationship between one cell in the transformed space and the corresponding cell(s) of the original space. WaveCluster 7

8 assigns the label of each cell in the original data space to all the data in that cell, and thus the clusters are determined. Discussion When the data are assigned to the cells of the quantized space at step 1 of the algorithm, the nal content of the cells is independent of the order in which the objects are presented. Since WaveCluster processes these cells in the remaining steps, thus the algorithm is order insensitive with respect to input data. WaveCluster nds the connected components in the average subband of the wavelet transformed space, as the output clusters. As mentioned earlier, average subband is constructed by convolving the low pass lter along each dimension and down sampling by two. So a wavelet transformed cell will be aected by the content of cells in the neighborhood covered by the lter. It means that the spatial relationships between neighboring cells will be preserved. The algorithm to nd the connected components labels each cell of transformed space with respect to the cluster that it belongs to. The label of each cell is determined based on the labels of its neighboring cells [4]. It does not make any assumptions about the shape of connected components and can nd convex, concave, or nested connected components. Hence, WaveCluster can detect arbitrary shapes of clusters. WaveCluster applies wavelet transform on the data space to generate multiple decomposition levels. Each time we consider a new decomposition level, we ignore some details in the average subband and eectively increase the size of a cell's neighborhood whose spatial relationship is considered. This results in sets of clusters with dierent degrees of details after each decomposition level of wavelet transform. In other words, we will have multi-resolution clusters at dierent scales, from ne to coarse. In our approach, a user does not have to know the exact number of clusters. However, a good estimation of number of clusters helps in choosing the appropriate scale and the corresponding clusters. One of the eects of applying low pass lter on the feature space is the removal of noise. WaveCluster takes advantage of this property, and removes the noise from the feature space automatically without requiring extra processing time. WaveCluster is a very fast method and as we will show its time complexity, it performs very eciently on very large databases. However, the performance of WaveCluster depends on the values of m (number of intervals in each dimension) and d (number of dimensions in the data space). In other words, quantization and dimensionality of the data space are two important issues in WaveCluster that we discuss below. Quantization of the Space. All the grid-based approaches for clustering spatial data suer 8

9 from the Modiable Areal Cell Problem (MAUP) addressed in [7]. The problem occurs in terms of scaling and aggregation. The problem of scaling is in selecting appropriate size and number of cells to represent the data. Aggregation is the problem of summarizing the data contained in each cell. All the present grid-based algorithms suer from these problems. In general, when the quantization value m is too low (very coarse quantization), more objects will be assigned to the same cell, and there is higher probability for the objects from dierent clusters to belong to the same cell. We call this case under-quantization problem. This results in merging of the clusters and mislabeling their objects, thus the quality of clustering decreases. In contrast, if the quantization value m is too high (very ne quantization), each object will be in a separate cell which might be far from the other cells. We call this over-quantization problem. Over-quantization can result in many unnecessary small clusters (that might be later removed as noise) and does not nd the real clusters, thus it will also decrease the quality of clustering. Aggregation also plays a role in clustering and it depends on the kind of algorithm used for clustering. In STING each cell maintains a list of statistical attributes like number of objects in the cell, mean, standard deviation, min, max, type of distribution of the values in the cell [10]. In CLIQUE proposed by Agrawal et. al., each cell is classied as dense or not based on the count value in each cell [1]. But none of the methods discusses the problems regarding aggregation. We argue that in this context, scaling is an inherent problem in what a human user can call a cluster, in other words, the denition of cluster. As Openshaw and Taylor stated, it seems very unlikely that there will ever be either a purely statistical or mathematical solution for MAUP [7]. To have an optimal quantization, application domain information should be incorporated. While other existing grid-based clustering methods ignore this problem, WaveCluster has the advantage of producing clusters at multiple scales at the same time. This means that the results of WaveCluster implicitly reect multiple quantizations of the space, resulting in multiple sets of clusters that can be selected based on the user's requirements. We may use a heuristic-based approach to experimentally nd a good quantization. We can start with an over-quantized space and try to nd reasonable clusters. If necessary, we then increase the size of cells and repeat the process until we get some acceptable clusters. At this point, WaveCluster, using multiresolution property of wavelet transform, can provide multiple sets of clusters at dierent scales. Dimensionality of the Space Assuming m intervals in each of d dimensions of the data space, there would totally be K = m d cells. Let the total number of data be N. Based on the dimensionality of data space, we use 9

10 two dierent representations for the data space. In the WaveCluster algorithm, steps 2 (applying wavelet transform) and 3 (nding connected components) will be dierent for these two cases, while steps 1 (quantization) and 4 (mapping) are the same. For low-dimensional spaces, we represent the space using a multi-dimensional matrix. It is a fast method which is simple to implement. This method is appropriate for large databases when N K. As an example, for a database with 1,000,000 objects when the number of dimensions d is less than or equal to 6, and the number of intervals m is 10, this condition holds. We present this approach in Section 3. However, for high number of dimensions we may have N < K. So the time and space complexity will grow exponentially with d. We use the sparseness of data in such spaces and represent the data space using a hash-based data structure. But applying wavelet transform and nding connected components on this representation are nontrivial problems which we address in Section 4. 3 Clustering in Low-dimensional Space Data Space Representation For low-dimensional data spaces, we can represent the space using a multi-dimensional matrix. Each element of the matrix corresponds to one cell in the quantized space. It provides a simple and fast method to access the information of neighboring cells of each cell. These information are required to apply wavelet transform or to nd the connected components. Applying Wavelet Transform Applying wavelet transform on the multi-dimensional matrix is straight forward. Convolution with the lters can be easily done resulting in subbands at dierent scales. In our experiments, we applied the three-level wavelet transforms Haar, Daubechies, and Cohen-Daubechies-Feauveau ((4,2) and (2,2)). Average subbands give approximations of the original data space at dierent scales, which help in nding clusters at dierent levels of details. For example, as shown in Figure 4, for a 2-dimensional data space, the subbands LL show the clusters at dierent scales. Finding Connected Components We use the algorithm in [4] to nd the connected components in the 2-dimensional data space (image). The same concept can be generalized for higher dimensions. The label of each cell is specied based on the labels of its neighboring cells. The connected component analysis consists 10

11 of scanning through the image once to nd all the connected components, and then equivalence analysis to re-label the components. This takes care of components with holes and concave shapes. There are many well known algorithms for nding connected components in images and we used the one mentioned in [4] for our purpose. Examples In Section 2, we showed how WaveCluster handles very large datasets (525,000 objects in the data presented in Figure 5) and how it can remove the noise. We also showed in Figure 4 how it can represent the clusters at dierent levels of details. Data mining methods should be capable of handling any arbitrary shaped clusters. Figure 6-a,b show a dataset and its clustering using WaveCluster. There are 2 arbitrary shaped clusters in the original data which are correctly detected. This result emphasizes eectiveness of the methods which do not assume the shape of the clusters a priori. Figure 6-c shows an example of a concave shape data distribution. Figure 6-d presents the clustering produced by WaveCluster. From these results, it is evident that WaveCluster is also very powerful in handling any type of sophisticated patterns. a) b) c) d) Figure 6: a) Original space; b) WaveCluster results; c) Original space; d) WaveCluster results. WaveCluster is a very fast method and most of its time is spent in reading the input data. For example, for the datasets presented in Figure 6, (with more than 200,000 objects), it only took 11

12 14.5 seconds to cluster (if we apply a 512x512 quantization). About 11 seconds of this time was spent in reading and quantization, and only 3.5 seconds was required for the real processing. We performed our experiments on a SUN SPARC workstation using 168 MHz UltraSparc CPU with SunOS operating system and 1024 MB memory. Time Complexity The time complexity of the rst and last steps of WaveCluster algorithm is O(N), because they scan all the database objects. Assuming m cells in each dimension of feature space, there would be K = m d cells. Complexity of applying wavelet transform would be O(K). To nd the connected components, the required time is O(K). Thus the time complexity of processing data (without considering I/O) which is performed in steps 2 and 3 would in fact be O(K). Since we assume that N K, the overall time complexity of the algorithm is O(N) [8]. During applying wavelet transform on each dimension of the data space, the required operations for each cell can be carried out independent of the other cells. Thus, using parallel processing can speed up transforming the space. The connected component analysis can also be speeded up using parallel processing. 4 Clustering in High-dimensional Space In high-dimensional space, it is expected that data will be sparse and most of the cells in the quantized space will be empty. An ecient way of storing only the nonempty cells in the quantized space is expected to drastically reduce the space complexity. We use a hash table approach to keep track of the nonempty cells only. The main idea is to eciently represent high-dimensional data in limited memory and perform wavelet transform as well as connected component analysis on this representation. But performing a convolution operation such as wavelet transform on this representation is a nontrivial problem. We present an accumulative approach to calculate wavelet transform in high dimensional space. And nally we nd k-connected components in high dimensional space by a depth-rst search through the hash table. Data Space Representation In the quantized space, every cell c i has the form of hc i1 ; c i2 ; : : : ; c id i which is called the key or index for c i, where c ij = [l ij ; h ij ) is the right open interval in the partitioning of dimension A j. The address of a cell in the hash table can be calculated by applying appropriate hash function on 12

13 the index of a cell. A hash table requires much less storage than a direct-address table, which was used in [8]. Specically, the storage requirement can be reduced from O(m d ) to (N 0 d), where N 0 is the number of nonempty cells in the quantized feature space. With hashing, a cell c i = hc i1 ; c i2 ; : : : ; c id i is stored in the hash bucket h(c i ); that is, a hash function h is used to compute the address for the cell c i. Formally, the hash function h maps the universe U of c i = hc i1 ; c i2 ; : : : ; c id i into the entries in the hash table T [0 : : : n? 1], where n is the number of buckets in the hash table. That is, h : U! 0; 1; :::; n? 1: Designing a hash function. In our approach, since hashing is performed frequently, the time spent on hashing directly aects eciency. Also, both applying wavelet transform and nding connected components require the neighborhood information, that is, to locate neighbor cells of a given cell. However, hashing permits any element to be mapped into any of the hash table buckets. Thus, it introduces the problem of determining or locating the neighbors of a cell. Another issue is the collision problem in which two or more cells may be hashed into the same bucket. Our goal to design the hash function is to achieve eciency, easy computing of neighbor cells, and minimal collision. We now dene our hash function. l = log 2 n and n is the number of hash table buckets as follows: A ld = We randomly generated an integer matrix A ld, where 0 r 1;1 r 1;2 ::: r 1;d r 2;1 r 2;2 ::: r 2;d ::: ::: ::: ::: r l;1 r l;2 ::: r l;d 1 : C A For each key c i = hc i1 ; c i2 ; : : : ; c id i, hash function h(c i1 ; c i2 ; : : : ; c id ) is: h(c i1 ; c i2 ; : : : ; c id ) = 0 r 1;1 r 1;2 ::: r 1;d r 2;1 r 2;2 ::: r 2;d ::: ::: ::: ::: r l;1 r l;2 ::: r l;d 1 C A K 0 c i1 c i2 ::: c id 1 0 = C A z 1 z 2 ::: z l 1 ; (2) C A J where is equivalent to matrix multiplication in binary operations and dened in [11]. Result z = hz 1 ; z 2 ; : : : ; z l i will be a string of 0 and 1, which is the address of the entry where cell c i is located in the hash table. In [11], we gave detailed denition of the hash function. It can be proved that the hash function given in Equation 2 maps a cell in any hash bucket with equal probability, which reduces collision. We further resolve collision by extended queuing with each bucket. 13

14 Calculating Wavelet Transform on Hashed Feature Space Wavelet transform is applied on hashed representation of the quantized space to generate a new hash table consisting of only the signicant cells (see the denition later). By scanning through the hash table and convolving the lter given in Equation 2 with each cell and its neighbors, we generate new cells in the transformed space. For any nonempty cell c i = hc i1 ; c i2 ; : : : ; c id i, the cells which will contribute to its value in the transformed space along dimension A j are, c k = hc i1 ; c i2 ; : : : ; c ij +k; : : : ; c id i, where M k M and M is the width of the wavelet lter. All the cells 2 2 stored in the hash table will get new values after wavelet transform is applied (See Figure 7). Also, because of the convolution operation in wavelet transform, some of the previously empty cells will become nonempty by receiving contributions from their neighboring cells. We call each potential nonempty cell a receiver and each old nonempty cell a contributor. In traditional implementation of wavelet transform, each receiver knows which cells to ask for contributions. Thus, the algorithm scans through all the potential receivers. In a multidimensional array implementation, every cell is considered as a potential receiver. So the algorithm has to scan through the entire space of cells which we should try to avoid in high dimension case because of the exponential growth in the number of cells. But, in the case of hashed implementation we only have information about the cells which are nonempty. So there are many potential receivers about whose hashed location we have no knowledge. Therefore, it is not possible to use traditional scanning algorithms directly on the hashed quantized feature space. h( c i ) h( c ) j h( c ) k s i s j s k + c s 1 + c s 2 + c s 3.. i j k h( c i ) h( c ) j h( c ) k s i s j s k h( c ) l s l + c ws l h( c ) l s l Original table Transformed table Figure 7: Traditional approach of calculating wavelet transform. In our approach, each contributor knows its receivers and the addresses of the receivers can be calculated by using the hash function in time O(l). So, instead of receivers asking for values, contributors distribute values to receivers. Thus, it is sucient to just scan through the contributors 14

15 which are already saved in the hash table. With slight modications, we can rewrite Formula 1 as: where?m=2 j < M=2. ds i+j = cm 2?js i + M 2?j?1 X k=0 c k s i+j? M X 2 +k + M?1 k= M 2?j+1 c k s i+j? M 2 +k; (3) Using this formula while scanning the hash table, each old nonempty cell or contributor is multiplied by a coecient and the result is accumulated into its receiver cells which are hashed into a new table (See Figure 8). h( c i ) h( c ) j h( c ) k h( c ) l s i s j s k s l Original hash table + c s 1 k + c 2 s k + c 3 s k... + c s k M h( c i ) h( c ) j h( c ) k h( c ) l s i s j s k s l Transformed table Figure 8: Accumulative approach of calculating wavelet transform. Due to the generation of new nonempty cells, the number of cells in the new hash table will be increased after wavelet transform is applied. In many cases, a large number of new nonempty cells tend to have very small count values. Many of these low count values are expected to be caused by the outliers rather than the actual clusters. Also, the actual cluster shapes are distorted on the surfaces because of the directionality property of convolution operation used in wavelet transform. Removing low count value cells by applying a threshold on the count values will eectively remove majority of the outliers and help preserving the original shape of the clusters. In addition, reduction in number of cells in the hash table is expected to improve the time complexity of the algorithm. We dene the signicant cell as a cell which has count values greater than a particular threshold. In the new hash table constructed after applying wavelet transform on the original hash table, only the signicant cells are stored. The threshold plays an important role in the quality of clustering and outlier removing. The details of determining the threshold can be found in [11]. 15

16 Finding Connected Components in Hash Table The hash table is essentially a graph G = (V; E), where V = fc i c i is a signicant cell in transformed spaceg and E = f(c 1 ; c 2 )jd(c 1 ; c 2 ) " = 1g. Here, we dene distance D as City-block distance or kl 1 k metric: D kl1 k(c 1 ; c 2 ) = dx i=1 j c 1i? c 2i j : There is an edge between two cells if and only if their indices dier on only one dimension. So every signicant cell has at most 2d neighbors. Let c j = hc j 1 ; c j 2 ; : : : ; c j k ; : : : ; c j d i be a cell and c 0 j = hc j 1 ; c k 2 ; : : : ; c 0 j k ; : : : ; c j d i is a neighbor of c j, where jc j k? c 0 j k j. The hashed index value of c 0 j can be computed using the following formula1 : h(c 0 j) = h(c j ) M 0 #(c j k ^ r 1;k ) #(c j k ^ r 2;k ) ::: #(c j k ^ r d;k ) where operator L is dened to be bitwise exclusive OR. 1 C A M 0 #(c j 0 k ^ r 1;k) #(c 0 j k ^ r 2;k ) ::: #(c 0 j k ^ r d;k ) 1 C A ; (4) Thus, given the index of a cell, the bucket number of the neighboring cells can be computed by the hash function in Equation 4. The clusters are the connected components of graph G, which can be found by a depth-rst-search algorithm. WaveCluster starts from the rst bucket in the hash table, assign it the rst cluster number and search the cells it is connected to. It continues to scan the hash table until all the cells are visited. Example We use parallel coordinates to visualize the clusters in high dimensional space. On the plane with xy-cartesian coordinates and starting on the y-axis, d parallel vertical lines are placed equi-distant and perpendicular to x-axis. They all have the same positive orientation as the y-axis. The values on each of the d axes that correspond to an individual point in the data space are connected by line segments between successive vertical axes resulting in a polygonal line. Figure 9-a,b show two points C(3, 3, 1) and D(2, 2, 3) in the 3-dimensional space and their parallel coordinates. Figure 9-c shows the parallel coordinates for a point P = (v 1 ; v 2 ; : : : ; v d?1 ; v d ) in d-dimensional space. Figure 10-a shows the parallel coordinate representation of a 12-dimensional dataset. dataset has 50,000 objects (including 10% noise data) which are grouped into 9 clusters. WaveCluster's results in Figure 10-b show that it has detected all the 9 clusters correctly and it has removed the noise data. The clusters are color-coded, however some of the clusters (which are not neighbors) 1 See [11] for the proof. This 16

17 z D(2, 2, 3) x C (3, 3, 1) 3 C D v 1 P v 2 v 3 v d-1 v d y x y z x 1 x x x x 2 3 d-1 d a) b) c) Figure 9: a) 3-D space; b) 3-D parallel coordinates; c) d-dimensional parallel coordinates. may have the same color. Our analysis showed that about 95% of data were correctly clustered by WaveCluster. a) b) Figure 10: a) Original data space; b) Clustering results. 17

18 Time Complexity It can be proved that by introducing the hash data structure to represent the dataset, we can cluster d-dimensional data in the time complexity of O(N d logn). Detailed analysis can be found in [11]. 5 Conclusion In this article, we presented the wavelet-based clustering approach for both low-dimensional and high-dimensional datasets. This grid-based approach applies wavelet transform on the quantized feature space and then detects the dense regions in the transformed space. Applying wavelet transform makes the clusters more distinct and salient in the transformed space and thus ease their detection. Using multiresolution property of wavelet transform, WaveCluster can detect the clusters at dierent scales and levels of details which can be very useful in the user's applications. Moreover, applying wavelet transform removes the noise from the original feature space, and thus can handle them properly and nd more accurate clusters. Our approach does not make any assumption about the shape of clusters and can successfully detect arbitrary shape clusters such as concave or nested clusters. It is also a very ecient method, which makes it specially attractive for very large databases. This approach is insensitive to the order of input data to be processed. Current clustering techniques do not address these issues suciently, although considerable work has been done in addressing each issue separately. This approach is the rst attempt to apply the properties of wavelet transform in the clustering problem in spatial data mining. It is a clever, yet natural, application of wavelets with spectacular end-results. References [1] Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 94{105, Seattle, WA, [2] M. Ester, H. Kriegel, J. Sander, and X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of 2nd International Conference on KDD,

19 [3] Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. Cure: An ecient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD conference on Management of Data, pages 73{84, Seattle, WA, [4] Berthold Klaus Paul Horn. Robot Vision. The MIT Press, forth edition, [5] S. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trnasactions on Pattern Analysis and Machine Intelligence, 11:674{693, July [6] R. T. Ng and J. Han. Ecient and Eective Clustering Methods for Spatial Data Mining. In Proceedings of the 20th VLDB Conference, pages 144{155, Santiago, Chile, [7] S. Openshaw and P. Taylor. Quantitative Geography: A British View, chapter The Modiable Areal Unit Problem, pages 60{69. London: Routledge, [8] G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In Proceedings of the 24th VLDB conference, pages 428{439, New York City, August [9] Greet Uytterhoeven, Dirk Roose, and Adhemar Bultheel. Wavelet transforms using lifting scheme. Technical Report ITA-Wavelets Report WP 1.1, Katholieke Universiteit Leuven, Department of Computer Science, Belgium, April [10] Wei Wang, Jiong Yang, and Richard Muntz. STING: A Statistical Information Grid Approach to Spatial Data Mining. In Proceedings of the 23rd VLDB Conference, pages 186{195, Athens, Greece, [11] D. Yu, S. Chatterjee, G. Sheikholeslami, and A. Zhang. Eciently detecting arbitrary shaped clusters in very large datasets with high dimensions. Technical Report 98-8, State University of New York at Bualo, Department of Computer Science and Engineering, November [12] Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: An Ecient Data Clustering Method for Very Large Databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103{114, Montreal, Canada,

Black Jack --- Review. Spring 2012

Black Jack --- Review. Spring 2012 Black Jack --- Review Spring 2012 Simulation Simulation can solve real-world problems by modeling realworld processes to provide otherwise unobtainable information. Computer simulation is used to predict

Læs mere

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende

Privat-, statslig- eller regional institution m.v. Andet Added Bekaempelsesudfoerende: string No Label: Bekæmpelsesudførende Changes for Rottedatabasen Web Service The coming version of Rottedatabasen Web Service will have several changes some of them breaking for the exposed methods. These changes and the business logic behind

Læs mere

Basic statistics for experimental medical researchers

Basic statistics for experimental medical researchers Basic statistics for experimental medical researchers Sample size calculations September 15th 2016 Christian Pipper Department of public health (IFSV) Faculty of Health and Medicinal Science (SUND) E-mail:

Læs mere

Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US

Generalized Probit Model in Design of Dose Finding Experiments. Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US Generalized Probit Model in Design of Dose Finding Experiments Yuehui Wu Valerii V. Fedorov RSU, GlaxoSmithKline, US Outline Motivation Generalized probit model Utility function Locally optimal designs

Læs mere

CHAPTER 8: USING OBJECTS

CHAPTER 8: USING OBJECTS Ruby: Philosophy & Implementation CHAPTER 8: USING OBJECTS Introduction to Computer Science Using Ruby Ruby is the latest in the family of Object Oriented Programming Languages As such, its designer studied

Læs mere

On the complexity of drawing trees nicely: corrigendum

On the complexity of drawing trees nicely: corrigendum Acta Informatica 40, 603 607 (2004) Digital Object Identifier (DOI) 10.1007/s00236-004-0138-y On the complexity of drawing trees nicely: corrigendum Thorsten Akkerman, Christoph Buchheim, Michael Jünger,

Læs mere

Vina Nguyen HSSP July 13, 2008

Vina Nguyen HSSP July 13, 2008 Vina Nguyen HSSP July 13, 2008 1 What does it mean if sets A, B, C are a partition of set D? 2 How do you calculate P(A B) using the formula for conditional probability? 3 What is the difference between

Læs mere

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1

Project Step 7. Behavioral modeling of a dual ported register set. 1/8/ L11 Project Step 5 Copyright Joanne DeGroat, ECE, OSU 1 Project Step 7 Behavioral modeling of a dual ported register set. Copyright 2006 - Joanne DeGroat, ECE, OSU 1 The register set Register set specifications 16 dual ported registers each with 16- bit words

Læs mere

Besvarelser til Lineær Algebra Reeksamen Februar 2017

Besvarelser til Lineær Algebra Reeksamen Februar 2017 Besvarelser til Lineær Algebra Reeksamen - 7. Februar 207 Mikkel Findinge Bemærk, at der kan være sneget sig fejl ind. Kontakt mig endelig, hvis du skulle falde over en sådan. Dette dokument har udelukkende

Læs mere

Sign variation, the Grassmannian, and total positivity

Sign variation, the Grassmannian, and total positivity Sign variation, the Grassmannian, and total positivity arxiv:1503.05622 Slides available at math.berkeley.edu/~skarp Steven N. Karp, UC Berkeley FPSAC 2015 KAIST, Daejeon Steven N. Karp (UC Berkeley) Sign

Læs mere

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov.

Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov. På dansk/in Danish: Aarhus d. 10. januar 2013/ the 10 th of January 2013 Kære alle Chefer i MUS-regi! Vores mange brugere på musskema.dk er rigtig gode til at komme med kvalificerede ønsker og behov. Og

Læs mere

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528)

Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM528) Skriftlig Eksamen Kombinatorik, Sandsynlighed og Randomiserede Algoritmer (DM58) Institut for Matematik og Datalogi Syddansk Universitet, Odense Torsdag den 1. januar 01 kl. 9 13 Alle sædvanlige hjælpemidler

Læs mere

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU

PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU PARALLELIZATION OF ATTILA SIMULATOR WITH OPENMP MIGUEL ÁNGEL MARTÍNEZ DEL AMOR MINIPROJECT OF TDT24 NTNU OUTLINE INEFFICIENCY OF ATTILA WAYS TO PARALLELIZE LOW COMPATIBILITY IN THE COMPILATION A SOLUTION

Læs mere

Resource types R 1 1, R 2 2,..., R m CPU cycles, memory space, files, I/O devices Each resource type R i has W i instances.

Resource types R 1 1, R 2 2,..., R m CPU cycles, memory space, files, I/O devices Each resource type R i has W i instances. System Model Resource types R 1 1, R 2 2,..., R m CPU cycles, memory space, files, I/O devices Each resource type R i has W i instances. Each process utilizes a resource as follows: request use e.g., request

Læs mere

Portal Registration. Check Junk Mail for activation . 1 Click the hyperlink to take you back to the portal to confirm your registration

Portal Registration. Check Junk Mail for activation  . 1 Click the hyperlink to take you back to the portal to confirm your registration Portal Registration Step 1 Provide the necessary information to create your user. Note: First Name, Last Name and Email have to match exactly to your profile in the Membership system. Step 2 Click on the

Læs mere

The X Factor. Målgruppe. Læringsmål. Introduktion til læreren klasse & ungdomsuddannelser Engelskundervisningen

The X Factor. Målgruppe. Læringsmål. Introduktion til læreren klasse & ungdomsuddannelser Engelskundervisningen The X Factor Målgruppe 7-10 klasse & ungdomsuddannelser Engelskundervisningen Læringsmål Eleven kan give sammenhængende fremstillinger på basis af indhentede informationer Eleven har viden om at søge og

Læs mere

Bilag. Resume. Side 1 af 12

Bilag. Resume. Side 1 af 12 Bilag Resume I denne opgave, lægges der fokus på unge og ensomhed gennem sociale medier. Vi har i denne opgave valgt at benytte Facebook som det sociale medie vi ligger fokus på, da det er det største

Læs mere

RoE timestamp and presentation time in past

RoE timestamp and presentation time in past RoE timestamp and presentation time in past Jouni Korhonen Broadcom Ltd. 5/26/2016 9 June 2016 IEEE 1904 Access Networks Working Group, Hørsholm, Denmark 1 Background RoE 2:24:6 timestamp was recently

Læs mere

The GAssist Pittsburgh Learning Classifier System. Dr. J. Bacardit, N. Krasnogor G53BIO - Bioinformatics

The GAssist Pittsburgh Learning Classifier System. Dr. J. Bacardit, N. Krasnogor G53BIO - Bioinformatics The GAssist Pittsburgh Learning Classifier System Dr. J. Bacardit, N. Krasnogor G53BIO - Outline bioinformatics Summary and future directions Objectives of GAssist GAssist [Bacardit, 04] is a Pittsburgh

Læs mere

Introduction Ronny Bismark

Introduction Ronny Bismark Introduction 1 Outline Motivation / Problem Statement Tool holder Sensor calibration Motion primitive Concatenation of clouds Segmentation Next possible pose Problems and Challenges Future Work 2 Motivation

Læs mere

CS 4390/5387 SOFTWARE V&V LECTURE 5 BLACK-BOX TESTING - 2

CS 4390/5387 SOFTWARE V&V LECTURE 5 BLACK-BOX TESTING - 2 1 CS 4390/5387 SOFTWARE V&V LECTURE 5 BLACK-BOX TESTING - 2 Outline 2 HW Solution Exercise (Equivalence Class Testing) Exercise (Decision Table Testing) Pairwise Testing Exercise (Pairwise Testing) 1 Homework

Læs mere

User Manual for LTC IGNOU

User Manual for LTC IGNOU User Manual for LTC IGNOU 1 LTC (Leave Travel Concession) Navigation: Portal Launch HCM Application Self Service LTC Self Service 1. LTC Advance/Intimation Navigation: Launch HCM Application Self Service

Læs mere

Applications. Computational Linguistics: Jordan Boyd-Graber University of Maryland RL FOR MACHINE TRANSLATION. Slides adapted from Phillip Koehn

Applications. Computational Linguistics: Jordan Boyd-Graber University of Maryland RL FOR MACHINE TRANSLATION. Slides adapted from Phillip Koehn Applications Slides adapted from Phillip Koehn Computational Linguistics: Jordan Boyd-Graber University of Maryland RL FOR MACHINE TRANSLATION Computational Linguistics: Jordan Boyd-Graber UMD Applications

Læs mere

Differential Evolution (DE) "Biologically-inspired computing", T. Krink, EVALife Group, Univ. of Aarhus, Denmark

Differential Evolution (DE) Biologically-inspired computing, T. Krink, EVALife Group, Univ. of Aarhus, Denmark Differential Evolution (DE) Differential Evolution (DE) (Storn and Price, 199) Step 1 - Initialize and evaluate Generate a random start population and evaluate the individuals x 2 search space x 1 Differential

Læs mere

Aggregation based on road topologies for large scale VRPs

Aggregation based on road topologies for large scale VRPs Aggregation based on road topologies for large scale VRPs Eivind Nilssen, SINTEF Oslo, June 12-14 2008 1 Outline Motivation and background Aggregation Some results Conclusion 2 Motivation Companies with

Læs mere

Exercise 6.14 Linearly independent vectors are also affinely independent.

Exercise 6.14 Linearly independent vectors are also affinely independent. Affine sets Linear Inequality Systems Definition 6.12 The vectors v 1, v 2,..., v k are affinely independent if v 2 v 1,..., v k v 1 is linearly independent; affinely dependent, otherwise. We first check

Læs mere

Help / Hjælp

Help / Hjælp Home page Lisa & Petur www.lisapetur.dk Help / Hjælp Help / Hjælp General The purpose of our Homepage is to allow external access to pictures and videos taken/made by the Gunnarsson family. The Association

Læs mere

Linear Programming ١ C H A P T E R 2

Linear Programming ١ C H A P T E R 2 Linear Programming ١ C H A P T E R 2 Problem Formulation Problem formulation or modeling is the process of translating a verbal statement of a problem into a mathematical statement. The Guidelines of formulation

Læs mere

Constant Terminal Voltage. Industry Workshop 1 st November 2013

Constant Terminal Voltage. Industry Workshop 1 st November 2013 Constant Terminal Voltage Industry Workshop 1 st November 2013 Covering; Reactive Power & Voltage Requirements for Synchronous Generators and how the requirements are delivered Other countries - A different

Læs mere

Particle-based T-Spline Level Set Evolution for 3D Object Reconstruction with Range and Volume Constraints

Particle-based T-Spline Level Set Evolution for 3D Object Reconstruction with Range and Volume Constraints Particle-based T-Spline Level Set for 3D Object Reconstruction with Range and Volume Constraints Robert Feichtinger (joint work with Huaiping Yang, Bert Jüttler) Institute of Applied Geometry, JKU Linz

Læs mere

Basic Design Flow. Logic Design Logic synthesis Logic optimization Technology mapping Physical design. Floorplanning Placement Fabrication

Basic Design Flow. Logic Design Logic synthesis Logic optimization Technology mapping Physical design. Floorplanning Placement Fabrication Basic Design Flow System design System/Architectural Design Instruction set for processor Hardware/software partition Memory, cache Logic design Logic Design Logic synthesis Logic optimization Technology

Læs mere

v Motivation v Multi- Atlas Segmentation v Learn Dictionary v Apply Dictionary v Results

v Motivation v Multi- Atlas Segmentation v Learn Dictionary v Apply Dictionary v Results Anatomical Atlas Probabilistic Atlas Shattuck, et al. NeuroImage. 2008 v Motivation v Multi- Atlas Segmentation v Learn Dictionary v Apply Dictionary v Results 2 Shattuck, et al. NeuroImage. 2008 Traditional

Læs mere

Curve Modeling B-Spline Curves. Dr. S.M. Malaek. Assistant: M. Younesi

Curve Modeling B-Spline Curves. Dr. S.M. Malaek. Assistant: M. Younesi Curve Modeling B-Spline Curves Dr. S.M. Malaek Assistant: M. Younesi Motivation B-Spline Basis: Motivation Consider designing the profile of a vase. The left figure below is a Bézier curve of degree 11;

Læs mere

Engelsk. Niveau D. De Merkantile Erhvervsuddannelser September Casebaseret eksamen. og

Engelsk. Niveau D. De Merkantile Erhvervsuddannelser September Casebaseret eksamen.  og 052431_EngelskD 08/09/05 13:29 Side 1 De Merkantile Erhvervsuddannelser September 2005 Side 1 af 4 sider Casebaseret eksamen Engelsk Niveau D www.jysk.dk og www.jysk.com Indhold: Opgave 1 Presentation

Læs mere

Skriftlig Eksamen Beregnelighed (DM517)

Skriftlig Eksamen Beregnelighed (DM517) Skriftlig Eksamen Beregnelighed (DM517) Institut for Matematik & Datalogi Syddansk Universitet Mandag den 31 Oktober 2011, kl. 9 13 Alle sædvanlige hjælpemidler (lærebøger, notater etc.) samt brug af lommeregner

Læs mere

Richter 2013 Presentation Mentor: Professor Evans Philosophy Department Taylor Henderson May 31, 2013

Richter 2013 Presentation Mentor: Professor Evans Philosophy Department Taylor Henderson May 31, 2013 Richter 2013 Presentation Mentor: Professor Evans Philosophy Department Taylor Henderson May 31, 2013 OVERVIEW I m working with Professor Evans in the Philosophy Department on his own edition of W.E.B.

Læs mere

Special VFR. - ved flyvning til mindre flyveplads uden tårnkontrol som ligger indenfor en kontrolzone

Special VFR. - ved flyvning til mindre flyveplads uden tårnkontrol som ligger indenfor en kontrolzone Special VFR - ved flyvning til mindre flyveplads uden tårnkontrol som ligger indenfor en kontrolzone SERA.5005 Visual flight rules (a) Except when operating as a special VFR flight, VFR flights shall be

Læs mere

Engineering of Chemical Register Machines

Engineering of Chemical Register Machines Prague International Workshop on Membrane Computing 2008 R. Fassler, T. Hinze, T. Lenser and P. Dittrich {raf,hinze,thlenser,dittrich}@minet.uni-jena.de 2. June 2008 Outline 1 Motivation Goal Realization

Læs mere

Statistical information form the Danish EPC database - use for the building stock model in Denmark

Statistical information form the Danish EPC database - use for the building stock model in Denmark Statistical information form the Danish EPC database - use for the building stock model in Denmark Kim B. Wittchen Danish Building Research Institute, SBi AALBORG UNIVERSITY Certification of buildings

Læs mere

DoodleBUGS (Hands-on)

DoodleBUGS (Hands-on) DoodleBUGS (Hands-on) Simple example: Program: bino_ave_sim_doodle.odc A simulation example Generate a sample from F=(r1+r2)/2 where r1~bin(0.5,200) and r2~bin(0.25,100) Note that E(F)=(100+25)/2=62.5

Læs mere

Observation Processes:

Observation Processes: Observation Processes: Preparing for lesson observations, Observing lessons Providing formative feedback Gerry Davies Faculty of Education Preparing for Observation: Task 1 How can we help student-teachers

Læs mere

ATEX direktivet. Vedligeholdelse af ATEX certifikater mv. Steen Christensen stec@teknologisk.dk www.atexdirektivet.

ATEX direktivet. Vedligeholdelse af ATEX certifikater mv. Steen Christensen stec@teknologisk.dk www.atexdirektivet. ATEX direktivet Vedligeholdelse af ATEX certifikater mv. Steen Christensen stec@teknologisk.dk www.atexdirektivet.dk tlf: 7220 2693 Vedligeholdelse af Certifikater / tekniske dossier / overensstemmelseserklæringen.

Læs mere

Unitel EDI MT940 June 2010. Based on: SWIFT Standards - Category 9 MT940 Customer Statement Message (January 2004)

Unitel EDI MT940 June 2010. Based on: SWIFT Standards - Category 9 MT940 Customer Statement Message (January 2004) Unitel EDI MT940 June 2010 Based on: SWIFT Standards - Category 9 MT940 Customer Statement Message (January 2004) Contents 1. Introduction...3 2. General...3 3. Description of the MT940 message...3 3.1.

Læs mere

Molio specifications, development and challenges. ICIS DA 2019 Portland, Kim Streuli, Molio,

Molio specifications, development and challenges. ICIS DA 2019 Portland, Kim Streuli, Molio, Molio specifications, development and challenges ICIS DA 2019 Portland, Kim Streuli, Molio, 2019-06-04 Introduction The current structure is challenged by different factors. These are for example : Complex

Læs mere

The complete construction for copying a segment, AB, is shown above. Describe each stage of the process.

The complete construction for copying a segment, AB, is shown above. Describe each stage of the process. A a compass, a straightedge, a ruler, patty paper B C A Stage 1 Stage 2 B C D Stage 3 The complete construction for copying a segment, AB, is shown above. Describe each stage of the process. Use a ruler

Læs mere

Small Autonomous Devices in civil Engineering. Uses and requirements. By Peter H. Møller Rambøll

Small Autonomous Devices in civil Engineering. Uses and requirements. By Peter H. Møller Rambøll Small Autonomous Devices in civil Engineering Uses and requirements By Peter H. Møller Rambøll BACKGROUND My Background 20+ years within evaluation of condition and renovation of concrete structures Last

Læs mere

Aktivering af Survey funktionalitet

Aktivering af Survey funktionalitet Surveys i REDCap REDCap gør det muligt at eksponere ét eller flere instrumenter som et survey (spørgeskema) som derefter kan udfyldes direkte af patienten eller forsøgspersonen over internettet. Dette

Læs mere

ECE 551: Digital System * Design & Synthesis Lecture Set 5

ECE 551: Digital System * Design & Synthesis Lecture Set 5 ECE 551: Digital System * Design & Synthesis Lecture Set 5 5.1: Verilog Behavioral Model for Finite State Machines (FSMs) 5.2: Verilog Simulation I/O and 2001 Standard (In Separate File) 3/4/2003 1 ECE

Læs mere

Statistik for MPH: 7

Statistik for MPH: 7 Statistik for MPH: 7 3. november 2011 www.biostat.ku.dk/~pka/mph11 Attributable risk, bestemmelse af stikprøvestørrelse (Silva: 333-365, 381-383) Per Kragh Andersen 1 Fra den 6. uges statistikundervisning:

Læs mere

Probabilistic properties of modular addition. Victoria Vysotskaya

Probabilistic properties of modular addition. Victoria Vysotskaya Probabilistic properties of modular addition Victoria Vysotskaya JSC InfoTeCS, NPK Kryptonite CTCrypt 19 / June 4, 2019 vysotskaya.victory@gmail.com Victoria Vysotskaya (Infotecs, Kryptonite) Probabilistic

Læs mere

Using SL-RAT to Reduce SSOs

Using SL-RAT to Reduce SSOs Using SL-RAT to Reduce SSOs Daniel R. Murphy, P.E. Lindsey L. Donbavand November 17, 2016 Presentation Outline Background Overview of Acoustic Inspection Approach Results Conclusion 2 Background Sanitary

Læs mere

De tre høringssvar findes til sidst i dette dokument (Bilag 1, 2 og 3). I forlængelse af de indkomne kommentarer bemærkes følgende:

De tre høringssvar findes til sidst i dette dokument (Bilag 1, 2 og 3). I forlængelse af de indkomne kommentarer bemærkes følgende: NOTAT VEDR. HØRINGSSVAR København 2018.10.26 BAGGRUND: Kommunalbestyrelsen i Frederiksberg Kommune vedtog den 18. april 2016 at igangsætte processen omkring etablering af et fælles gårdanlæg i karré 41,

Læs mere

Brug sømbrættet til at lave sjove figurer. Lav fx: Få de andre til at gætte, hvad du har lavet. Use the nail board to make funny shapes.

Brug sømbrættet til at lave sjove figurer. Lav fx: Få de andre til at gætte, hvad du har lavet. Use the nail board to make funny shapes. Brug sømbrættet til at lave sjove figurer. Lav f: Et dannebrogsflag Et hus med tag, vinduer og dør En fugl En bil En blomst Få de andre til at gætte, hvad du har lavet. Use the nail board to make funn

Læs mere

Engelsk. Niveau C. De Merkantile Erhvervsuddannelser September 2005. Casebaseret eksamen. www.jysk.dk og www.jysk.com.

Engelsk. Niveau C. De Merkantile Erhvervsuddannelser September 2005. Casebaseret eksamen. www.jysk.dk og www.jysk.com. 052430_EngelskC 08/09/05 13:29 Side 1 De Merkantile Erhvervsuddannelser September 2005 Side 1 af 4 sider Casebaseret eksamen Engelsk Niveau C www.jysk.dk og www.jysk.com Indhold: Opgave 1 Presentation

Læs mere

Den nye Eurocode EC Geotenikerdagen Morten S. Rasmussen

Den nye Eurocode EC Geotenikerdagen Morten S. Rasmussen Den nye Eurocode EC1997-1 Geotenikerdagen Morten S. Rasmussen UDFORDRINGER VED EC 1997-1 HVAD SKAL VI RUNDE - OPBYGNINGEN AF DE NYE EUROCODES - DE STØRSTE UDFORDRINGER - ER DER NOGET POSITIVT? 2 OPBYGNING

Læs mere

Heuristics for Improving

Heuristics for Improving Heuristics for Improving Model Learning Based Testing Muhammad Naeem Irfan VASCO-LIG LIG, Computer Science Lab, Grenoble Universities, 38402 Saint Martin d Hères France Introduction Component Based Software

Læs mere

Angle Ini/al side Terminal side Vertex Standard posi/on Posi/ve angles Nega/ve angles. Quadrantal angle

Angle Ini/al side Terminal side Vertex Standard posi/on Posi/ve angles Nega/ve angles. Quadrantal angle Mrs. Valentine AFM Objective: I will be able to identify angle types, convert between degrees and radians for angle measures, identify coterminal angles, find the length of an intercepted arc, and find

Læs mere

Forslag til implementering af ResearcherID og ORCID på SCIENCE

Forslag til implementering af ResearcherID og ORCID på SCIENCE SCIENCE Forskningsdokumentation Forslag til implementering af ResearcherID og ORCID på SCIENCE SFU 12.03.14 Forslag til implementering af ResearcherID og ORCID på SCIENCE Hvad er WoS s ResearcherID? Hvad

Læs mere

Design til digitale kommunikationsplatforme-f2013

Design til digitale kommunikationsplatforme-f2013 E-travellbook Design til digitale kommunikationsplatforme-f2013 ITU 22.05.2013 Dreamers Lana Grunwald - svetlana.grunwald@gmail.com Iya Murash-Millo - iyam@itu.dk Hiwa Mansurbeg - hiwm@itu.dk Jørgen K.

Læs mere

Gusset Plate Connections in Tension

Gusset Plate Connections in Tension Gusset Plate Connections in Tension Jakob Schmidt Olsen BSc Thesis Department of Civil Engineering 2014 DTU Civil Engineering June 2014 i Preface This project is a BSc project credited 20 ECTS points written

Læs mere

Satisability of Boolean Formulas

Satisability of Boolean Formulas SAT exercises 1 March, 2016 slide 1 Satisability of Boolean Formulas Combinatorics and Algorithms Prof. Emo Welzl Assistant: (CAB G36.1, cannamalai@inf.ethz.ch) URL: http://www.ti.inf.ethz.ch/ew/courses/sat16/

Læs mere

Clear aim to ensure safety of all people involved in optical links project Scope

Clear aim to ensure safety of all people involved in optical links project Scope Laser Safety Outline Clear aim to ensure safety of all people involved in optical links project Scope Hazard classification of CMS Tracker readout link system Requirements Note: Safety requirements here

Læs mere

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1

IBM Network Station Manager. esuite 1.5 / NSM Integration. IBM Network Computer Division. tdc - 02/08/99 lotusnsm.prz Page 1 IBM Network Station Manager esuite 1.5 / NSM Integration IBM Network Computer Division tdc - 02/08/99 lotusnsm.prz Page 1 New esuite Settings in NSM The Lotus esuite Workplace administration option is

Læs mere

Skriftlig Eksamen Beregnelighed (DM517)

Skriftlig Eksamen Beregnelighed (DM517) Skriftlig Eksamen Beregnelighed (DM517) Institut for Matematik & Datalogi Syddansk Universitet Mandag den 7 Januar 2008, kl. 9 13 Alle sædvanlige hjælpemidler (lærebøger, notater etc.) samt brug af lommeregner

Læs mere

Agenda. The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark

Agenda. The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark Agenda The need to embrace our complex health care system and learning to do so. Christian von Plessen Contributors to healthcare services in Denmark Colitis and Crohn s association Denmark. Charlotte

Læs mere

X M Y. What is mediation? Mediation analysis an introduction. Definition

X M Y. What is mediation? Mediation analysis an introduction. Definition What is mediation? an introduction Ulla Hvidtfeldt Section of Social Medicine - Investigate underlying mechanisms of an association Opening the black box - Strengthen/support the main effect hypothesis

Læs mere

Dumped ammunition - an environmental problem for sediment management?

Dumped ammunition - an environmental problem for sediment management? 5th International SedNet Conference, 27th-29th May 2008, Oslo, Norway Dumped ammunition - an environmental problem for sediment management? Jens Laugesen, Det Norske Veritas Harald Bjørnstad, Forsvarsbygg

Læs mere

Strings and Sets: set complement, union, intersection, etc. set concatenation AB, power of set A n, A, A +

Strings and Sets: set complement, union, intersection, etc. set concatenation AB, power of set A n, A, A + Strings and Sets: A string over Σ is any nite-length sequence of elements of Σ The set of all strings over alphabet Σ is denoted as Σ Operators over set: set complement, union, intersection, etc. set concatenation

Læs mere

OXFORD. Botley Road. Key Details: Oxford has an extensive primary catchment of 494,000 people

OXFORD. Botley Road. Key Details: Oxford has an extensive primary catchment of 494,000 people OXFORD Key Details: Oxford has an extensive primary catchment of 494,000 people Prominent, modern scheme situated in prime retail area Let to PC World & Carpetright and close to Dreams, Currys, Land of

Læs mere

Fejlbeskeder i SMDB. Business Rules Fejlbesked Kommentar. Validate Business Rules. Request- ValidateRequestRegist ration (Rules :1)

Fejlbeskeder i SMDB. Business Rules Fejlbesked Kommentar. Validate Business Rules. Request- ValidateRequestRegist ration (Rules :1) Fejlbeskeder i SMDB Validate Business Rules Request- ValidateRequestRegist ration (Rules :1) Business Rules Fejlbesked Kommentar the municipality must have no more than one Kontaktforløb at a time Fejl

Læs mere

QUICK START Updated:

QUICK START Updated: QUICK START Updated: 24.08.2018 For at komme hurtigt og godt igang med dine nye Webstech produkter, anbefales at du downloader den senest opdaterede QuickStart fra vores hjemmeside: In order to get started

Læs mere

what is this all about? Introduction three-phase diode bridge rectifier input voltages input voltages, waveforms normalization of voltages voltages?

what is this all about? Introduction three-phase diode bridge rectifier input voltages input voltages, waveforms normalization of voltages voltages? what is this all about? v A Introduction three-phase diode bridge rectifier D1 D D D4 D5 D6 i OUT + v OUT v B i 1 i i + + + v 1 v v input voltages input voltages, waveforms v 1 = V m cos ω 0 t v = V m

Læs mere

Teknologispredning i sundhedsvæsenet DK ITEK: Sundhedsteknologi som grundlag for samarbejde og forretningsudvikling

Teknologispredning i sundhedsvæsenet DK ITEK: Sundhedsteknologi som grundlag for samarbejde og forretningsudvikling Teknologispredning i sundhedsvæsenet DK ITEK: Sundhedsteknologi som grundlag for samarbejde og forretningsudvikling 6.5.2009 Jacob Schaumburg-Müller jacobs@microsoft.com Direktør, politik og strategi Microsoft

Læs mere

Appendix 1: Interview guide Maria og Kristian Lundgaard-Karlshøj, Ausumgaard

Appendix 1: Interview guide Maria og Kristian Lundgaard-Karlshøj, Ausumgaard Appendix 1: Interview guide Maria og Kristian Lundgaard-Karlshøj, Ausumgaard Fortæl om Ausumgaard s historie Der er hele tiden snak om værdier, men hvad er det for nogle værdier? uddyb forklar definer

Læs mere

How Long Is an Hour? Family Note HOME LINK 8 2

How Long Is an Hour? Family Note HOME LINK 8 2 8 2 How Long Is an Hour? The concept of passing time is difficult for young children. Hours, minutes, and seconds are confusing; children usually do not have a good sense of how long each time interval

Læs mere

Evaluating Germplasm for Resistance to Reniform Nematode. D. B. Weaver and K. S. Lawrence Auburn University

Evaluating Germplasm for Resistance to Reniform Nematode. D. B. Weaver and K. S. Lawrence Auburn University Evaluating Germplasm for Resistance to Reniform Nematode D. B. Weaver and K. S. Lawrence Auburn University Major objectives Evaluate all available accessions of G. hirsutum (TX list) for reaction to reniform

Læs mere

Popular Sorting Algorithms CHAPTER 7: SORTING & SEARCHING. Popular Sorting Algorithms. Selection Sort 4/23/2013

Popular Sorting Algorithms CHAPTER 7: SORTING & SEARCHING. Popular Sorting Algorithms. Selection Sort 4/23/2013 Popular Sorting Algorithms CHAPTER 7: SORTING & SEARCHING Introduction to Computer Science Using Ruby Computers spend a tremendous amount of time sorting The sorting problem: given a list of elements in

Læs mere

Measuring the Impact of Bicycle Marketing Messages. Thomas Krag Mobility Advice Trafikdage i Aalborg, 27.08.2013

Measuring the Impact of Bicycle Marketing Messages. Thomas Krag Mobility Advice Trafikdage i Aalborg, 27.08.2013 Measuring the Impact of Bicycle Marketing Messages Thomas Krag Mobility Advice Trafikdage i Aalborg, 27.08.2013 The challenge Compare The pilot pictures The choice The survey technique Only one picture

Læs mere

A multimodel data assimilation framework for hydrology

A multimodel data assimilation framework for hydrology A multimodel data assimilation framework for hydrology Antoine Thiboult, François Anctil Université Laval June 27 th 2017 What is Data Assimilation? Use observations to improve simulation 2 of 8 What is

Læs mere

United Nations Secretariat Procurement Division

United Nations Secretariat Procurement Division United Nations Secretariat Procurement Division Vendor Registration Overview Higher Standards, Better Solutions The United Nations Global Marketplace (UNGM) Why Register? On-line registration Free of charge

Læs mere

Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , )

Statistik for MPH: oktober Attributable risk, bestemmelse af stikprøvestørrelse (Silva: , ) Statistik for MPH: 7 29. oktober 2015 www.biostat.ku.dk/~pka/mph15 Attributable risk, bestemmelse af stikprøvestørrelse (Silva: 333-365, 381-383) Per Kragh Andersen 1 Fra den 6. uges statistikundervisning:

Læs mere

GUIDE TIL BREVSKRIVNING

GUIDE TIL BREVSKRIVNING GUIDE TIL BREVSKRIVNING APPELBREVE Formålet med at skrive et appelbrev er at få modtageren til at overholde menneskerettighederne. Det er en god idé at lægge vægt på modtagerens forpligtelser over for

Læs mere

MATHIC, SINGULAR & XMALLOC

MATHIC, SINGULAR & XMALLOC MATHIC, SINGULAR & XMALLOC Christian Eder POLSYS Team, UPMC, Paris, France June 11, 2013 1 / 17 1 SINGULAR Signature-based Gröbner Basis algorithms Restructuring SINGULAR 2 XMALLOC 3 MATHIC Overall structure

Læs mere

Peering in Infrastructure Ad hoc Networks

Peering in Infrastructure Ad hoc Networks Peering in Infrastructure Ad hoc Networks EE 228a Course Project Mentor : Linhai He Group : Matulya Bansal Sanjeev Kohli Presentation Outline Introduction to the problem Objectives Problem Formulation

Læs mere

Mandara. PebbleCreek. Tradition Series. 1,884 sq. ft robson.com. Exterior Design A. Exterior Design B.

Mandara. PebbleCreek. Tradition Series. 1,884 sq. ft robson.com. Exterior Design A. Exterior Design B. Mandara 1,884 sq. ft. Tradition Series Exterior Design A Exterior Design B Exterior Design C Exterior Design D 623.935.6700 robson.com Tradition OPTIONS Series Exterior Design A w/opt. Golf Cart Garage

Læs mere

Choosing a Medicare prescription drug plan.

Choosing a Medicare prescription drug plan. Choosing a Medicare prescription drug plan. Look inside to: Learn about Part D prescription drug coverage Find out what you need to know about Part D drug costs Discover common terms used with Part D prescription

Læs mere

QUICK START Updated: 18. Febr. 2014

QUICK START Updated: 18. Febr. 2014 QUICK START Updated: 18. Febr. 2014 For at komme hurtigt og godt igang med dine nye Webstech produkter, anbefales at du downloader den senest opdaterede QuickStart fra vores hjemmeside: In order to get

Læs mere

South Baileygate Retail Park Pontefract

South Baileygate Retail Park Pontefract Key Details : available June 2016 has a primary shopping catchment of 77,000 (source: PMA), extending to 186,000 within 10km (source: FOCUS) 86,000 sq ft of retail including Aldi, B&M, Poundstretcher,

Læs mere

Sikkerhedsvejledning

Sikkerhedsvejledning 11-01-2018 2 Sikkerhedsvejledning VIGTIGT! Venligst læs disse instruktioner inden sengen samles og tages i brug Tjek at alle dele og komponenter er til stede som angivet i vejledningen Fjern alle beslagsdele

Læs mere

POSitivitiES Positive Psychology in European Schools HOW TO START

POSitivitiES Positive Psychology in European Schools HOW TO START POSitivitiES Positive Psychology in European Schools HOW TO START POSitivitiES Positive Psychology in European Schools PositivitiES er et Comenius Multilateral europæisk projekt, som har til formål at

Læs mere

Info og krav til grupper med motorkøjetøjer

Info og krav til grupper med motorkøjetøjer Info og krav til grupper med motorkøjetøjer (English version, see page 4) GENERELT - FOR ALLE TYPER KØRETØJER ØJER GODT MILJØ FOR ALLE Vi ønsker at paraden er en god oplevelse for alle deltagere og tilskuere,

Læs mere

Bilag 8. TDC technical requirements for approval of splitterfilters and inline filters intended for shared access (ADSL or VDSL over POTS).

Bilag 8. TDC technical requirements for approval of splitterfilters and inline filters intended for shared access (ADSL or VDSL over POTS). Bilag 8. TDC technical requirements for approval of splitters and inline s intended for shared access (ADSL or VDSL over POTS). Dette bilag udgør bilag 8 til det mellem parterne tiltrådte Produkttillæg

Læs mere

UNISONIC TECHNOLOGIES CO.,

UNISONIC TECHNOLOGIES CO., UNISONIC TECHNOLOGIES CO., 3 TERMINAL 1A NEGATIVE VOLTAGE REGULATOR DESCRIPTION 1 TO-263 The UTC series of three-terminal negative regulators are available in TO-263 package and with several fixed output

Læs mere

DET KONGELIGE BIBLIOTEK NATIONALBIBLIOTEK OG KØBENHAVNS UNIVERSITETS- BIBLIOTEK. Index

DET KONGELIGE BIBLIOTEK NATIONALBIBLIOTEK OG KØBENHAVNS UNIVERSITETS- BIBLIOTEK. Index DET KONGELIGE Index Download driver... 2 Find the Windows 7 version.... 2 Download the Windows Vista driver.... 4 Extract driver... 5 Windows Vista installation of a printer.... 7 Side 1 af 12 DET KONGELIGE

Læs mere

SKRIFTLIG EKSAMEN I NUMERISK DYNAMIK Bygge- og Anlægskonstruktion, 7. semester Torsdag den 19. juni 2003 kl Alle hjælpemidler er tilladt

SKRIFTLIG EKSAMEN I NUMERISK DYNAMIK Bygge- og Anlægskonstruktion, 7. semester Torsdag den 19. juni 2003 kl Alle hjælpemidler er tilladt SKRIFTLIG EKSAMEN I NUMERISK DYNAMIK Bygge- og Anlægskonstruktion, 7. semester Torsdag den 9. juni 23 kl. 9.-3. Alle hjælpemidler er tilladt OPGAVE f(x) x Givet funktionen f(x) x, x [, ] Spørgsmål (%)

Læs mere

Skriftlig Eksamen Diskret matematik med anvendelser (DM72)

Skriftlig Eksamen Diskret matematik med anvendelser (DM72) Skriftlig Eksamen Diskret matematik med anvendelser (DM72) Institut for Matematik & Datalogi Syddansk Universitet, Odense Onsdag den 18. januar 2006 Alle sædvanlige hjælpemidler (lærebøger, notater etc.),

Læs mere

how to save excel as pdf

how to save excel as pdf 1 how to save excel as pdf This guide will show you how to save your Excel workbook as PDF files. Before you do so, you may want to copy several sheets from several documents into one document. To do so,

Læs mere

Handling Sporadic Tasks in Off- Line Scheduled Distributed Real Time Systems

Handling Sporadic Tasks in Off- Line Scheduled Distributed Real Time Systems Handling Sporadic Tasks in Off- Line Scheduled Distributed Real Time Systems Damir Isović & Gerhard Fohler Department of Computer Engineering Mälardalen University, Sweden Presented by : Aseem Lalani Outline

Læs mere

Business Rules Fejlbesked Kommentar

Business Rules Fejlbesked Kommentar Fejlbeskeder i SMDB Validate Business Request- ValidateRequestRegi stration ( :1) Business Fejlbesked Kommentar the municipality must have no more than one Kontaktforløb at a time Fejl 1: Anmodning En

Læs mere

Nyhedsmail, december 2013 (scroll down for English version)

Nyhedsmail, december 2013 (scroll down for English version) Nyhedsmail, december 2013 (scroll down for English version) Kære Omdeler Julen venter rundt om hjørnet. Og netop julen er årsagen til, at NORDJYSKE Distributions mange omdelere har ekstra travlt med at

Læs mere