Skip to content

A novel architectural design for stitching video streams in real-time on an FPGA.

License

Notifications You must be signed in to change notification settings

jongchoon/FPGA-Build

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

FPGA Architecture for Real-time Video Stitching

A novel architectural design for stitching video streams in real-time on an FPGA.
Explore the docs »

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact

About The Project

The designed architecture generates a video having a wider feild of view by stitching two video input based on features and keypoints. In simple terms, the output generated will be a panorama but with video. The architecture is optimized such that the output can be produced in real-time.

Algorithm

The figure below illustrates the block diagram of the system depicting each step of the algorithm.

Block Diagram

The system can be broadly divided into three subystems:

  • Preprocessing
  • SIFT Based Feature Extraction
  • Frame Stitching

Preprocessing

The input video stream for the system is in 8 bit RGB format. The input 8 bit image is shown in figure. Each individual frame of the video stream will have three channels corresponding to red, green and blue. The colour information in the video frames does not enhance feature detection. Moreover, computation on a 3 channel 8 bit image takes more time compared to a single channel 8 bit image. Therefore, the RGB video frame is converted to an 8 bit grayscale image. The generated grayscale images will have lesser noise, more details in the shadows and provides better computational efficiency, shown in figure.

Input image Grayscale image
Input image Grayscale image

SIFT Based Feature Extraction

Feature extraction from the grayscale images is done using SIFT algorithm. SIFT algorithm can be separated into two main steps:

  • Keypoint Detection

    SIFT operation begins with discrete convolution of the input image with different Gaussian filters. A Gaussian filter is a widely used image smoothing algorithm defined as:

    In the above equation, G is the Gaussian kernel at the point (x, y) and σ is the Gaussian parameter. Using a larger value of σ produces a greater smoothing effect on the image. Discrete convolution of the image with Gaussian kernel generates an image with lesser noise and lesser details. In SIFT, discrete convolution with Gaussian kernel is done with four different values of σ. Progressively higher values of σ is used to generate a set of blurred images or an octave.
Input image Sigma1_6 Sigma2_26 Sigma3_2 Sigma4_5
Input image Sigma = 1.6 Sigma = 2.26 Sigma = 3.2 Sigma = 4.5

For a given value of σ, the sum of all coefficients in the convolution kernal should be equal to unity. Therefore, the size of the kernal increases as the value of σ increases.

Once the octave is generated, a DoG space is built based on the four images in the octave. DoG stands for difference of Gaussian. DoG is a very computationally efficient approximation of Laplacian of Gaussian (LoG). The DoG space is built by computing the difference between two adjacent Gaussian scale images, pixel by pixel. DoG space of four images in the octave will have three levels.

DoG1 DoG2 DoG3
Top level DoG Middle level DoG Bottom level DoG

Keypoints are extracted from the DoG space by finding the local maxima or minima values. A pixel is considered a keypoint if it is a local maxima or minima within a 26 pixel neighbourhood consisting of 9 pixels in the top level, 8 pixels in the middle level and 9 pixels in the bottom level.


Keypoints

SIFT1 SIFT2 SIFT3
Keypoints using OpenCV sift function Keypoints using SIFT implementation in Python Keypoint generated by the FPGA design
  • Descriptor Generation

Keypoint descriptor is a unique identifier for a particular keypoint. SIFT uses gradient magnitude and direction of the keypoint as the basis for the descriptor. Gradient magnitude and direction at a point can be calculated by discrete convolution of the image with Sobel filters.


Sobel convolution output

To generate the keypoint descriptor, gradient magnitude and direction of every point inside a 16x16 window around each keypoint is calculated. The gradient magnitudes of the 16x16 window is convolved with a Gaussian kernel. The gradient magnitudes in every 4x4 cell is combined such that the 16x16 window is reduced to a 4x4 window and 16 gradient directions. Finally, these 16 gradient directions are transferred into eight bins. Hence a 128 element vector is built which acts as the keypoint descriptor.

Frame Stitching

Frame stitching is the process of combining two frames into a single image. Frame stitching is done in two steps:

  • Keypoint Matching

    The keypoint descriptors of keypoints in the video frames from both camera sensors are compared. If the difference between the keypoint descriptors of two keypoints, one from each camera sensor, is below a error threshold, then they are considered as a keypoint pair. The keypoint pair with the least difference between their keypoint descriptors is taken as the reference keypoints.

    DoG1 DoG1
    Input image from left camera Input image from right camera
  • Image Blending

    A weighed average method is used to blend the two frames into a single image. The values of pixels in the overlapped region is equal to the weighted average values of pixels of both the frames. The weights are chosen based on the distance between the overlapped pixel and the border of the corresponding frame.


    Stitched image

Top Level Design

The block schematic of the architecture from top level is shown in figure below.


Block Schematic

The top level design is divided into five stages:

Getting Started

Prerequisites

The following packages needs to be installed on the Linux system before executing the source code.

  • Icarus Verilog

    apt-get install iverilog
  • Python

    apt-get install python3
  • OpenCV

    pip3 install opencv-contrib-python
  • numpy

    pip3 install numpy
  • PIL (Python Image Library)

    pip3 install pillow

Installation

  1. Clone the repo
    git clone https://github.com/AugustinJose1221/FPGA-Build.git
  2. Change working directory
    cd FPGA-Build/make
  3. Compile the design
    make create
  4. To view the RTL waveform
    make simulate
  5. Generate output image
    python3 hexToImage.py

Usage

Project Tree

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Twitter: @augustinjose121
Gmail: [email protected]
Discuss: Github Discussions

About

A novel architectural design for stitching video streams in real-time on an FPGA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Verilog 98.3%
  • Other 1.7%