Sensing and computing challenges for enhanced data integrity

This eBook presents academic ultra-clean scientific environments and the corresponding digital transformation challenges of these environments, especially the computer science challenges to provide enhanced scientific data integrity

In Section 1, we describe the specificity of academic ultra-clean environments with their requirements and the role of computer science to meet these requirements.

Throughout Section 2, we discuss the digital scientific data acquisition from scientific instruments and their processing challenges for the computing infrastructure.

Section 3 presents the core of the computing, networking, and sensing infrastructures’ challenges to sense, process, distribute and visualize scientific data with high data integrity.

Presented in Section 4 are the difficult sustainability challenges that academic ultra-clean environments must face. The article concludes with a summary of issues that must be solved to speed up scientific innovation and give scientists digital tools to gain further scientific insights.

1. Specificity of Academic Ultra-Clean Environments

Semiconductor chip manufacturing has largely served as the backbone of the digital era. With each new generation of computing, (calculators, computers, smartphones, AR/VR glasses), the supporting hardware has evolved and innovated throughout the years to achieve the performance and cost requirements necessary to make handheld computing devices a reality and ubiquitous among modern society. An example of one of the innovations required for chip manufacturing is the adoption of ultra-clean environments such as cleanrooms, as shown in Figure 1. As the size of chips and the sensitivity of processing these chips became more stringent, the use of modern cleanrooms needed higher control of the environment.

The reason is to prevent stray particles from affecting chip yield and to create a controlled environment that provides stable humidity, temperature, and airflow to significantly improve chip yield and device performance.

Figure 1: Cleanroom in Holonyak Micro-Nano-Technology Laboratory

Silicon is the most widely used semiconductor material for modern chip manufacturing. Each new generation of integrated circuit performance improved in speed and capability every two years (a cadence commonly referred to as Moore’s Law) by shrinking the size and consequently increasing the density of transistor chips. This trend continues even today when the average size of a transistor has reached the level of single nanometers. For perspective, the diameter of a single strand of human hair spans approximately 25,400 nanometers. As a result, if a single strand of hair landed on a wafer, thousands of devices would be wiped out due to processing failures caused by the human hair. This exemplifies the strict cleanliness required of cleanrooms to manufacture modern semiconductor chips.

Academic cleanrooms and their equipment at universities are very different from industrial cleanrooms. These differences stem from the fundamental functionalities that each is required to support. In industrial cleanrooms, these ultra- clean environments are designed to facilitate high-volume, high-yield manufacturing. With the supporting capital of multi-billion-dollar companies (Intel, TSMC, Samsung, etc.), these cleanrooms are equipped with state-of-the-art equipment and sensors with the mission to produce the same chip design in massive quantities. Industrial cleanrooms are equipped with the highest degree of cleanliness and a sensory network that constantly monitors and provides a strict controlled clean environment. Chip manufacturing involves hundreds of processing steps that must be strictly controlled to achieve functioning integrated circuits. Since industrial chip manufacturing produces the same process repeatedly, chip manufacturers can collect a large batch of read-out data from each process. Read-out data such as temperature, pressure, and plasma power can give indications as to the “health” of each process.

On the other hand, academic cleanrooms function as a testbed to explore and investigate riskier innovative ideas. As a result, research topics such as quantum computing, 2D materials, and flexible electronics tend to introduce more exotic materials not commonly seen in an industrial cleanroom. These other materials often require a different set of fabrication chemicals and safety standards that a silicon chip cleanroom would not typically encounter. In addition to the materials that are introduced, the personnel of cleanroom users are quite different as well. In an industrial cleanroom, there are manufacturing teams with supervisors, engineers, and technicians that form a well-trained group with the single goal of manufacturing chips in a cleanroom. However, in an academic cleanroom, the users are mostly graduate or post-doctoral students that do not receive the same calibre of intensive cleanroom training. Furthermore, the goals and research of each student are vastly different from one another. This requires a cleanroom capable of supporting research of diverse materials and devices that is also used largely by younger and less experienced personnel compared to industrial cleanrooms. As most academic cleanrooms do not receive the same capital investment as industrial cleanrooms, most of the equipment and sensory networks are old and outdated. It is therefore important for digital transformation researchers to develop low-cost, self-deployable sensory networks that achieve the same functionality as the large expensive sensory networks of industrial cleanrooms to continue producing competitive and innovative research.

Challenges of Academic Cleanrooms:

Most equipment used in academic environments as scientific tools were designed for industrial fabrication applications. Thus, although these scientific tools can be used for a variety of use-cases, their ideal state is to repeatedly run a single process allowing for easily monitored tool health. In academia, however, these tools are pushed to their limits. Each tool will be used for a large diversity of processes by a variety of users who may have minimal experience with the tools. With limited budgets, academic cleanrooms tend to have older, manual tools further exacerbating the difficulty of maintaining the systems and will rarely have backup equipment for when the tools inevitably need to be fixed. The goal then for academic cleanrooms is then robust observations of the tools so preventative maintenance can be performed, limiting the downtime of these expensive, essential tools.

The greatest challenge with academic cleanrooms and research is to support very diverse processes with limited digital datasets. The processes in an academic cleanroom are expensive due to the low-volume and customized nature of the research. This leads to the vastly lower number of digital measurements produced in an academic cleanroom that is needed for artificial intelligence and machine learning (AI/ML) algorithms to achieve high accuracy data classification and/or object detection. Furthermore, most academic cleanrooms are equipped with outdated equipment and do not possess a sensory network for environmental monitoring around equipment as industrial cleanrooms do due to the level of the cost required to implement these features. The capability to deploy low-cost sensory networks that implement preventive maintenance in an academic cleanroom is therefore important to sustain a cleanroom environment that is competitive with state-of- the-art technology for academic researchers.

2. Scientific Data Acquisition and Processing from Scientific Instruments

For semiconductor processing, a large variety of digital data is produced during the scientific process. Datasets that include processing equipment read-out such as gas flows, plasma power, and pressure provide a measure of the process characteristic (deposition thickness, etching depth, etc.) as well as process consistency and equipment health. On the other hand, several critical steps during the device processing may require additional measurements to guarantee the accuracy and precision of the process. For instance, Scanning Electron Microscopy (SEM) images are used to verify sidewall profiles of etching processes. The main challenge is that each process can require a different set of equipment and a different set of measurement tools to verify that process. For example, while in the case of etching, the equipment was an ICP-RIE etcher and the verification tool was an SEM, in the case of deposition, the equipment is a PECVD whereas the verification tool is an ellipsometer that measures film thickness.

Figure 2- SEM Image and 4CeeD Tree View of Scientific Data Storage System

Given the wide variety of tools and their inconsistent usage from one academic researcher to another, the data collection process is often very manual. For items like process parameters and outcomes, such as the mentioned example of etching with parameters such as gas flow or power and characteristics such as etch depth, a range of manual note-taking techniques are used at the time of the process. Most common methods include writing notes in individual notebooks or inputting notes into individual or shared documents stored online. For other datasets like images from a microscope, e.g., SEM (see Figure 2), where the data is already digitized, these are collected through shared drives, specifically designed scientific data storage systems, or local USB storage devices if internet connectivity to the microscope is not present due to tool age and security concerns. Most processing of this data is then done in separate labs or offices after the cleanroom processes have been done.

Challenges of Scientific Data Acquisition and Processing Workflows
The challenges of scientific data acquisition and processing include (1) data curation and processing, (2) multi-modal data fusion and (3) failure analysis.

Data curation and processing:

Due to the diverse dataset that is accumulated over an entire device creation process, and the lack of a centralized data infrastructure that automatically combines the datasets from each tool into a central location, most academic cleanroom data is very isolated and discrete. While in principle, the collection of data is interlinked because each process is serially conducted and impacts the process after it, for academic researchers, most data is separated and often does not contain the proper process information describing the previous processes that have accumulated to the resulting dataset. For instance, if there are 6 process steps conducted before a researcher takes an SEM image of the fabricated device and realizes there is an error, the researcher does not know if it was step 5 or step 1 that is the root cause of the error. Only with the combined information of each process step can it be fully concluded which step caused the process failure.

Furthermore, the currently existing data storage infrastructure for microscopy images such as file explorer and google cloud are based on a “tree view”. Without tediously opening each file, the “tree view” only allows users to input experimental parameters in the file name. This leads to extremely long file names that serve to encompass the entire experiment in key-value pairs such as “06-10- 2022 GaAsEtch_BCl3-20sccm_Cl2-10sccm_Ar- 5sccm_RIE-200W_ICP-400W_8mT.txt”. We have developed a research system, called 4CeeD is a system [Ngyuen2017] that displays all pertinent information in one easy format that alleviates the issues of using a “tree view” data storage system (See Figure 2). Further integration of 4CeeD to achieve automatic data logging would be the final goal for a desired data storage system. However, challenges arise when digitizing data from old, outdated equipment that still uses analogue readout panels while also navigating through the proprietary software control systems of new fabrication equipment. An open-source method of interfacing with processing

equipment tools is required to fully develop a low-cost, centralized private cloud data storage infrastructure that automatically collects data from each piece of equipment for academic researchers.

Multi-modal data fusion:

The main challenge with collecting data from a cleanroom fabrication process is the diversity of data that is produced from a wide variety of scientific equipment. Furthermore, the interlinking and cascading effects of each process make each dataset a representative of multi-modal data fusion. The challenge is how to automate tracking of the whole process, and interlink and correlate data.

From an individual fabrication process perspective, each process can have multiple datasets that describe the same phenomenon. For instance, a lithography process will have the lithography recipe with key-value pairs that describe the spin speed that the photoresist is dispensed, the exposure dosage that the photoresist is activated for, and the development time that the unwanted photoresist is washed away. However, to verify the success of this process, an optical or SEM image is taken of the top-view and sidewall view to verify and ensure that the correct dimensions and sidewall profile are successfully replicated.

Then from an interlinking process perspective, each process characteristic is propagated through the next process. For instance, etching is a common process followed by lithography. If there is a defect in the lithography process that is not identified during the visual inspection step, this defect will propagate into the etching process. Once it is identified during the visual inspection after the etching process, a misconception can occur where because the defect was identified during the etching process, a false conclusion that the etching process has an issue can be made. However, the true failure mode occurred during the lithography process. Eliminating false conclusions can save precious material, time, and processing resources that significantly increase productivity in academic as well as industrial cleanrooms.

Failure analysis and anomaly detection:

Failure analysis in fabrication processes is often done manually via visual inspection to track the consistency and desired features of microscope image datasets produced during the fabrication process (see Figure 3 for SEM images from successful controlled experiments and failed experiments). For instance, in lithography steps as aforementioned, there is a visual inspection step that occurs to ensure the desired outcome of the lithography process is met. However, these inspections are rather qualitative from an academic user perspective. Whether or not the shape, sharpness of the edge, and colour of the photoresist look “correct” is up to the user. Using AI/ML, a quantitative method to determine whether the photoresist will yield a successful or unsuccessful process is an extremely powerful tool [Wang2021].

Furthermore, introducing additional process variants and observing the effect may lead (1) to a tool that can be used to predict the overall photolithography process result without wasting the resources and (2) to an experiment that can be extremely helpful for academic researchers and industry professionals.

Figure 3- Optical microscope image from a developed photoresist within a controlled environment (Controlled Experiment) Vs. excess humidity environment (Failed Experiment)

However, the main issue is the lack of microscope image data sets that are produced in an academic cleanroom setting. Due to the lower volume and more custom processes academic cleanrooms produce, the datasets are very small and are very diverse from one another. This leads to challenges when creating an AI/ML training algorithm to determine whether a fabrication process is a success or a failures.

Another challenge concerning anomaly detection is the lack of ground truth labels for the sensory data deployed externally in cleanrooms. The large-scale sensory data (e.g., humidity, temperature, vibration sensory data) collected from the various sensors placed around the cleanroom equipment and from digital communication processes change rapidly over time and are bound to be noisy. The anomalies contained within this data are often characterized by subtle process deviations. These anomalies often get contaminated by the surrounding noise that may overshadow the few, rare anomalous events. Thus, annotating these data values with the correct labels is notoriously difficult. The absence of these ground truth labels makes the AI/ML-based anomaly detection process rather more challenging, resulting in high false positives rate or high false negatives rate due to the dominance of spurious anomalies. Thus, collecting the data and labelling it in the wild is imperative to correctly identify the realistic anomalies and to ensure the robustness of the AI/ML-based anomaly detection algorithms.


To find out more, click the accompanying eBook


Please enter your comment!
Please enter your name here