Multimedia databases provides features that allow users to store and query different types of multimedia information, which include images, video clips, audio clips and documents. The main types of database queries that are needed involve locating multimedia sources that contain certain objects of interest.

 

Content-based retrieval is type of multimedia database concept in which multimedia source is being retrieved based on its containing certain objects or activities. That is multimedia must use some model to organize and index the multimedia sources based on their contents. Identifying the contents of multimedia sources is a difficult and time consuming task. There are two approaches. The first is based on automatic analysis of the multimedia sources to identify certain mathematical characteristics of their contents. This approach uses different techniques depending on the type of multimedia sources, that is different for images, video clips, audio clips and documents. The second approach depends on manual identification of objects and activities of interest in each multimedia source and on using this information to index the sources. This approach can be applied to all the different multimedia sources, but requires a manual preprocessing phase where a person has to scan each multimedia source to identify and catalog the objects and activities it contains so that they can be used to index these sources.

 

An image is typically stored either in raw form as a set of pixel or cell values, or in compressed form to save space. The image shape descriptor describes the geometric shape of the raw image, which is typically a rectangle of cells of certain width and height. Hence, each image can be represented by m and n grid of cells. Each cell contains a pixel value that describes the cell content. In black and White images pixels can be one bit. In gray scale or color images, a pixel is multiple bits. Because image may require large amounts of space ,they are often stored in compressed form. Compression standards, such as GIF and JPEG, use various mathematical transformations reduce the number of cells stored but still maintain the main image characteristics. The mathematical transforms that can be used include Discrete Fourier transforms, Disrete Cosine Transform and wavelet transforms.

 

To identify objects interest of an image, the image is typically divided into homogeneous segments using a homogeneity predicate. The homogeneity predicate defines the conditions for how to automatically group those cells. Segmentation and compression can hence identify the main characteristics of an image. A typical image database query would find the images in the database that are similar to a given image. There are two main techniques for this type of search. The first approach uses a distance function to compare the given image with the stored images and their segments. If the distance value returned is small ,the probability of a match is high. The second approach called the transformation approach.

 

A video source is typically represented as a sequence of frames, where each frame is still image. However, rather than identifying the objects and activities in every individual frame, the video is divided into video segments, where each segment is made up of a sequence of contiguous frames that includes the same objects/activities. Each segments can be used to index the segments.

 

A text/document source is basically the full text of some article , book or magazine. The indexing of these sources are usually done by identifying keywords that appear in the text and their frequencies. But those filler words are usually eliminated . Because there could be too many keywords when attempting to index a collection of documents ,technique have been developed to reduce number of keywords. A technique calls single value decomposition(SVD), which is based on matrix transformations, can be used for this purpose. An indexing technique called telescoping vector trees or TV-trees can then be used to group similar document together.

 

Audio sources consists of recorded messages such as class presentations, speeches or even surveillance recording of phone messages or conversations by law enforcement. Here discrete transforms can be used to identify the main characteristics of a person’s voice to have similarity based indexing and retrieval.