Research on Audio Retrieval based on Content

时间:2022-08-11 10:18:14

Abstract: This paper introduces the basic structure of audio retrieval system based on content, and in the related literature at home and abroad, analyzes the main features of audio retrieval algorithm that divided into the following several types: minimum distance method, neural network, Support Vector Machine, decision tree search algorithm and other audio retrieval algorithm. At the same time, this paper discusses some key techniques of audio retrieval.

Keywords: Quick audio retrieval; key technology; Support Vector Machine; decision tree search algorithm

0 Introductions

Audio is an important part of the multimedia, people can get audio information through various channels such as: film, TV, radio, Internet, telephone, concert, meeting or conversation etc.. Along with the audio information is more and more people each capable of processing, audio information types more and more various, from this mass of audio information quickly, effectively retrieve audio information need has become more and more important.

The domestic and foreign research institutions many studies on audio search conducted. For example, American Muscle Fish is a commercial search engine audio feature based audio perceptual; Maryland University Voice Graph retrieval of known speaker and words based on content and speaker based query, and designed a kind of audio visual query interface. In addition, the ARS system is the audio information retrieval and classification system based on content. ARS system has been created an original audio library, which included speech, music, the sound of laughter, animal, telephone and so on more than a dozen nearly 300 audio file, the file format of "wav" format. The actual retrieval process, the tone, volume, brightness, bandwidth, zero crossing rate 5 features, and use clustering algorithm based on Euclidean distance to put all the files into 50 categories, the formation of a clustering parameter library. After clustering, the system characteristics of original audio, audio feature library is established, and the formation of an audio database. The actual retrieval is the audio database retrieval; there are three kinds of retrieval methods, respectively, as the basic attribute retrieval, feature retrieval, text retrieval.

1 The audio retrieval model based on content

Figure 1 is a flow chart of audio information retrieval system based on content.

The first step, we need establish database. The audio data feature extraction, the audio data save the database of original audio library, that is media library section, load the characteristics to characteristic database part, through clustering characteristics of audio data, the clustering information save into the clustering parameters database. The database can be established after the audio information retrieval.

The second step, we need determine the query feature vector. Audio retrieval use query by example (QBE) mode, the user determine the sample and setting the value of the property through the query interface, and then submit the query, system characteristic extraction of samples, combined with the attribute value to determine the query feature vector.

The third step, the query feature matching. Matching according to the correlation Search engine on the feature vector and clustering parameter set, from big to small order in the feature library and the original audio retrieval data corresponding to a certain number of, and through the query interface is returned to the user. Among them, the original audio inventory is on the audio data; feature stock on the audio data, according to the record store; clustering parameter library is parameter clustering income on audio feature sets, codebook, threshold information including feature vector space.

The fourth step, the query results. The user refine the retrieval results of stepwise, shrinking matching set range, thus positioning to meet the needs of users audio through the human-computer interaction.

2 The main techniques of audio retrieval based on the content

(1) Audio feature extraction

Feature extraction is the expression of a form for the original audio signal; extraction can represent the original signal data. Audio feature extraction method is the digital signal processing technology; it usually can be divided into the analysis of time domain and frequency domain two. Time domain analysis method focuses on the audio signal waveform, such as the average zero crossing rate (Average Zero Cross Rate), energy and auto correlation function. The method of frequency domain analysis is mainly the audio spectrum involves some form of representation, such as the short time FFT analysis, Homomorphic convolution analysis, wavelet transform and filter banks method.

(2) Audio classification

In automatic classification of audio must be measure audio similarity, and similarity retrieval is an important characteristic of content-based audio retrieval. Therefore, the classification problem is the core problem of content-based audio retrieval. In addition, also has the important significance of automatic classification of audio content to improve the speech recognition accuracy. Through the classification can determine the voice audio environment prior to the speech model, adaptive adjustment algorithm provides clues.

Audio information categories is very much, and audio affected by the environmental noise, the audio signal and the transmission channel is also great, category theme is a reflection of the audio content of high-level semantic feature. Assumption of automatic content-based audio classification based on feature is, audio can be expressed by means of acoustic characteristics of low-level, exist a mapping relation between them, and in fact is the existence of the distance between them. Therefore, the technical difficulties of content based audio retrieval including how to extract audio features can reflect the categories based on low-level acoustic features, as well as how to construct the classifier, in order to better establish the mapping relationship between the low-level features and high-level audio acoustic category feature.

At present, the audio classification system adopted by most statistical learning methods, giving a number of a category labeled training samples, through supervised learning training to generate classifiers, and then the test sample collection of the samples to be classified test to measure the performance of classification. Audio classification algorithm typically includes the minimum distance method, neural network, support vector machines, the decision tree method and the hidden Markov model method etc..

(a) The minimum distance method

The advantages of minimum distance classification method is intuitive, simple, is conducive to the establishment of the geometric concept of multidimensional space classification method. Minimum distance classification method used in audio classification have K nearest neighbor (K-NN) and the nearest feature line method (NFL) etc.. K nearest neighbor method, not only from a sample point nearest neighbor unknown samples of X class classification, but according to the K sample points of the X nearest neighbor classification to determine the X category. Therefore, need to calculate X and all samples Xi distance D (X, Xi), and choose the smallest K samples as the nearest neighbor sample set from the KNN, and calculate sum of distance Wj which all belong to the category, and classified according to the following rules:

When K=1, K nearest neighbor method will degenerate to the nearest neighbor method, because the K nearest neighbor method using more sample information to determine its class, big K value can beneficial to reduce the influence of noise. However, because the K nearest neighbor method needs to calculate all the sample distance, when the number of samples is very large, the amount of calculation is considerable. The nearest feature line method to solve this problem, the idea is to choose some prototypes from each sample subspace feature points, two two lines of these feature points is called feature Line, this set of features used to represent the original line of each type of sample book space.

The set of prototype feature points C is :, Prototype feature point Nc is a class C number, number corresponding to feature line is:Kc= Nc (Nc-1) /2,but the characteristic line C set constitute of the characteristic line space, it is the class of C subspace. In general, the number of prototype feature points selected is relatively small, so the number of feature line Kc is relatively small. The distance of the unknown sample X and the characteristic line define X projection distance on,As shown in Figure 2, the shortest distance all feature lines and X and category C distance of X and C in the space of feature line.

(b) Neural Network

In audio classification using neural network, can make be relative input layer feature vector nodes and audio, and the corresponding node in the output layer category Ci, as shown in figure 3. During the training, the training sample set repeated learning to adjust the network, so that the global error functions minimum. In this way, we can expect the network to the new input sample to be classified T output the correct classification of Ci.

(d) Support Vector Machine

Support Vector Machine(SVM) is new methods by Vapnik based on Structural Risk Minimization Principle. The processing method initially come from the value of the two classification problems, the mechanism can be simply described as: looking for a separate training set of positive and negative samples of hyper plane in the sample space, and the sides of the blank area of the maximum, as shown in figure 4. The support vector machine method has perfect mathematic theory basis, by using the two programming (Quadratic Programming) and the use of kernel function mapping input data into a higher dimensional space, which can solve the classification problem of linear inseparable. By using the method of audio classification accuracy of SVM to the nearest feature line high, however, because of its good generalization performance, the retrieval precision is higher. However, the SVM method of the long training time, and the need to constantly test to select the proper kernel functions and parameters.

(d) The decision tree method

Decision tree is a simple structure, high search efficiency. This kind of method based on information theory, a large number of examples to select the important features to build a decision tree, as shown in figure 5.

The decision tree classification method has the following advantages: (a) complex decision regions can be decomposed into sub space of different levels in the decision tree "and" to fit; (b) hierarchical multi stage has higher search efficiency; (c) the other classification methods, feature are generally selected according to global optimal the principle, for example, the average maximum separation between, while in the decision tree method, can select different feature subset for branch nodes of different trees, as long as the branch node, the feature sub set identification with optimum performance, so the feature selection is relatively flexible; (d) the branch node only to relatively small feature subspace, which can solve the problem of "dimension disaster".

The design of the optimal decision tree is a NP Completeness problem, The design principles can be represented formally as ,Where T is the specific decision tree structure, F and D is respective features and decision rule of branch node subset, D for all training data,is conditional probability of classification error εin the data set D on the selected feature set F and structure of decision rules obtained from D training as a decision tree T. Therefore, constructing process of decision tree can be divided into three problems: selecting appropriate structure, select the appropriate feature subset and decision rules for the branch node.

3 Conclusions

Audio retrieval is a new research field, and which are closely linked signal processing, the perception of psychological research and pattern recognition and other disciplines, in China it is still in the stage of exploration and research. Due to the diversity of its retrieval object and scope, audio information retrieval to solve many problems, such as a variety of means of combining, in order to improve the retrieval efficiency and accuracy. To enable the computer to like people to realize automatic audio semantic understanding, and audio retrieval based on the semantic level, China still faces many challenges.

Reference

[1] R. Cai, L. Lu, H.J. Zhang, L.H. Cai. Highlight sound effects detection in audio stream. Proceeding of the 4th IEEE International Conference on Multimedia and Expo, 2003.3:37~40

[2] R. Cai, L. Lu, A. Hanjalic, L.H. Cai. A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Speech Audio Process, 2006, 14 (3): 1026~1039

[3] C. Gal, L. Eugene, R. Marin, B. Samy, L. Dick. Large-scale content-based audio retrieval from text queries. Proceedings of the 1st International ACM Conference on Multimedia Information Retrieval (MIR 2008), 2008:105 ~112

[4] Yi YU, Kazuki JOE, J. Stephen. Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison. IEICE TRANSA- CTIONS on Information and Systems, 2008, 6: 1730~1739

上一篇:Optimization Design Scheme of Oracle Databa... 下一篇:Research on Ecological Restoration of Wetla...