Summary: Data from a semi-conductor manufacturing process.
Parameter | Value |
---|---|
Name | SECOM |
Labeled | Yes |
Time Series | No |
Simulation | No |
Missing Values | Yes |
Dataset Characteristics | Multivariate |
Feature Type | Real |
Associated Tasks | Classification, Causal-Discovery |
Number of Instances | 1567 |
Number of Features | 591 |
Date Donated | 2008-11-18 |
Source | UCI Machine Learning Repository |
Key facts: Data Structure: The data consists of 2 files—the dataset file SECOM consisting of 1567 examples, each with 591 features, forming a 1567 x 591 matrix, and a labels file containing the classifications and date-time stamp for each example.
As with any real-life data situations, this data contains null values, varying in intensity depending on the individual features. This needs to be taken into consideration when investigating the data, either through pre-processing or within the technique applied.
The data is represented in a raw text file, each line representing an individual example, and the features separated by spaces. The null values are represented by the 'NaN' value as per MATLAB.
Manufacturing, Semi-conductor, Process optimization, Feature selection, Industrial data