DES™ is part of the MBS™ bioinformatics package!
DES™ perform data mining procedures, such as extracting patterns from data. As more data are gathered by mass spectrometer and the amount of data doubling for every three years, data mining is becoming an increasingly important tool to transform these data into information. DES™ performs the necessary procedures to extract valuable information from large MALDI or SELDI datasets.
Transformation of original data is usually applied (prior to actual analysis) in order to make data meet assumptions of statistical procedures and/or to improve interpretability of graphs.
Available transformations are:
In one-dimensional (descriptive) data analysis, popular summary statistics are used to summarize the basic properties of a single (selected) feature (M/Z). The most commonly used measures of central tendency and measures of variability are computed.
Descriptive statistics available:
All summary statistics are computed in two variants:
Results:
The aim of multi-dimensional data visualization is to provide a graphical summary of relationships (similarities) between samples (spectra) or between features (M/Z's) in a given dataset. Results are displayed using easy to interpret heatmaps, that show correlations between samples or features, accordingly.
Available tools:
In the case of correlation between features, in order to facilitate the analysis, a smaller number of best features is selected using AUC (Area Under ROC Curve) statistics.
Results:
Identification of disease-specific biomarkers is one of the major topics of proteomics providing opportunities to develop and validate new therapeutic or diagnostic strategies.
The biomarker detection analysis available in the software allows for the examination of discriminative properties of all features. Ability to discriminate "disease" or "control" groups is assessed with the aid of a selected separation measure.
The following separation measures are available:
The result is reported as a feature ranking showing the selected number of best features, i.e. features that exhibit the largest discriminative power. The obtained ranking can be further used to identify potential biomarkers.
Results:
Principal Component Analysis (PCA) is a widely used feature extraction and visualization technique that transforms original features into a smaller number of important directions called principal components (PCs). The first principal component accounts for as much of data variability as possible, the second component (orthogonal to the first one) accounts for as much of the remaining variability as possible, etc.
Results of PCA analysis are presented as:
Results:
Partial Least Squares (PLS), also known as projection to latent structures by means of partial least squares, finds linear combinations of original features (predictors) whose correlations with the response (i.e. disease status) is maximized, and which are mutually uncorrelated.
Results:
Mass spectrometry-based proteomics is a powerful technology used in studies concerned with classification of disease states. The protein samples from disease patients and control (i.e. non-disease) patients are analyzed through MS instruments and the resulting MS patterns are used to build a classifier.
In the software binary classification of disease states, i.e. classification of samples (spectra) to either disease or control group can be performed. Currently available classification algorithms include:
Additionally, feature selection step (preceding the actual classification) is included in the analysis. A subset of best features can be selected using a given separation measure, including: divergence, Fisher score, SAM score, T-test, Kolmogorov-Smirnov (KS) statistics, T-score, Wilcoxon-Mann-Whitney (WMW) test and AUC.
Classifier performance is assessed using standard learning/test set random split scheme and a number of accuracy measures. The following performance measures are used:
The performance results are presented for learning and test sets separately.
Results:
Cluster Analysis (or clustering) is used to create homogeneous groups of objects (clusters), where objects in one cluster are similar to each other and objects in different clusters are quite distinct.
In order to find clusters in data the popular k-means clustering algorithms is used. The whole range for a number of clusters can be investigated in order to find an optimal one.
For each number of clusters (considered in the analysis), evaluation of clustering quality is performed using a selected validation index. Currently available clustering validation indices include:
Result of the clustering analysis are displayed as a simple line plot showing values of the validation index against number of clusters.
Results:

MBS™ is a flexible bioinformatics tool that was created for different tasks for Mass Spectrometry data analysis. The software tool contains many different algorithms created by different experts in Mass Spectrometry Proteomics and Statistics.

MBS™ contains the software tool PDS™ for Peak Extraction and Peak Detection, and the software tool DES™ for Data-Mining and Pattern Extraction, both used for MALDI Proteomics Data Analysis.

MBS™ contains the software tool PAS™ for Pair Extraction, De Novo Sequencing, Detection of Phosphorylated Peptides, and Detecting Known and Unknown Peptide Modifications, used for TANDEM MS (MS/MS) Data Analysis.