Flow-assisted visual tracking using event cameras

2021-09-06 09:26:14RuiJiangQinyiWangShunshunShiXiaozhengMouShoushunChen

CAAI Transactions on Intelligence Technology 2021年2期

Rui Jiang |Qinyi Wang,2 |Shunshun Shi |Xiaozheng Mou |Shoushun Chen,2

1CelePixel Technology Co.Ltd,71 Nanyang Drive,Singapore 638075

2School of Electrical and Electronic Engineering,Nanyang Technological University,Singapore 639798

Abstract The data from event cameras not only portray contours of moving objects but also contain motion information inherently.Herein,motion information can be used in eventbased and frame-based object trackers to ease the challenges of occluded objects and data association,respectively.In the event-based tracker,events within a short interval are accumulated.Within the interval,the histogram of local time measurements (or ‘motion histogram’)is proposed as the feature to describe the target and candidate regions.Then the mean-shift tracking approach is used by shifting the tracker towards similarity maximisation on motion histograms between target and candidate regions.As for the frame-based tracker,given the assumption that a single object moves at a constant velocity on the image plane,the distribution of local timestamps is modelled,followed by which object-level velocities are obtained from parameter estimation.We then build a Kalman-based ensemble,in which object-level velocities are deemed as an additional measurement on top of object detection results.Experiments have been conducted to measure the performance of proposed trackers based on our self-collected data.Thanks to the assistance from motion information,the event-based tracker successfully differentiates partially overlapped objects with distinct motion profiles;The inter-frame tracker avoids data association failure on fast-moving objects and leads to fast convergence on object velocity estimation.

1|INTRODUCTION

Object tracking,which aims to provide accurate and consistent object-level bounding boxes with unique identifications for different objects,has been widely used in surveillance,humanmachine interaction,and Advanced Driver Assistance Systems(ADAS).As a low-cost sensor with abundant information,cameras have been popular in object tracking system design.New ideas and approaches have been proposed due to higher demands on tracking performance,in particular,accuracy,speed,and robustness.

Although achievements have been made,visual tracking on fast-moving objects remains an important but challenging issue.For example,it is essential for ADAS to track cutting-in pedestrians and vehicles so that a potential accident could be avoided.As shown in Figure 1,the conventional tracker may fail in associating detection results in consecutive frames,because the only clue during association is the objects' position,considering computational efficiency.Moreover,a framebased tracker usually requires several frames as‘measurements’before the convergence of‘states’such as position and velocity,thus a higher frame rate is necessary to achieve faster convergence.Another problem in tracking comes from occluded objects.Trackers may fail in scenarios with overlapped objects since obscured visual features may‘mislead’the tracker and cause confusion when the object appears again.Thus,it is desired to have a robust feature that describes objects themselves while is not sensitive to external interference.

Event cameras,as a new type of vision sensor,output spatio-temporal coordinates when the intensity change on the corresponding pixel reaches a hardware-related adjustable threshold.The independent sensing principle enables high dynamic range and ultra-high temporal resolution of the sensor.Making use of event cameras'motion sensing capability,we aim to design trackers that extract and abstract motion features from objects to achieve improved data association performance and faster convergence on high-speed objects under partial occlusion scenarios.

FIGURE 1 Tracking fast-moving objects.The figure shows the SAE (see Section 3.3.1 for details)where a car is moving horizontally.Due to the small intersection between previous and current detections,the tracker assigns the same object in consecutive frames with different IDs,which lead to data association failure

This work studies the visual tracking problem based on event cameras,with the main contributions as follows.

1.We have proposed a new ‘motion feature’ from event cameras.The Surface of Active Events is used to represent pixel-level motion on the image plane.With the help of object detection bounding boxes,we build histograms of local timestamp measurements to differentiate objects with distinct motion profiles.Moreover,the distribution of timestamp measurements on a single object is derived under the assumptions of local consistencies and constant velocity.With Maximum Likelihood Estimation,statistical characteristics of objects' motion,namely mean and variance,have been obtained.In short,the proposed feature utilizes inherent information from event cameras and identifies objects based on their motion instead of appearance.

2.The extracted motion feature has been successfully integrated with both event-based and frame-based tracking frameworks.The event-based tracker,which is inspired from the mean-shift algorithm,utilizes histograms of local timestamp measurements as the feature.The interframe Kalman-type tracker takes advantages of estimated statistical characteristics in adding measurement equations describing object-level velocities.

3.The proposed trackers have been evaluated under our self-collected dataset in different scenarios.The performance of trackers demonstrates the effectiveness of the proposed motion feature in event-based and frame-based tracking frameworks.

Herein,Section 2 presents related work in fast object tracking and event-based tracking.Section 3 first introduces the event camera,its unique data format,and the mean-shift tracker.We then discuss local motion extraction from events,and the distribution of local time measurements in preparation for object-level velocity estimation.The proposed object tracking approaches,including an event-based tracker and an inter-frame tracker,are elaborated in Sections 4 and 5,respectively,followed by evaluation results in Section 6.Finally,Section 7 concludes the work.

2|RELATED WORK

Most trackers in the literature are frame-based:absolute intensity information can be collected to enable high-level perception and understanding;but the sampling mechanism makes it hard to extract motion between frames,especially for fast-moving objects.Focusing on visual tracking on high-speed objects,many approaches have been proposed.The pyramid or other multi-scale representations have been used to deal with large motion at the cost of real-time performance or tracking accuracy [1,2].In Ref.[3],a set of motion models have been established in the ‘filter bank’ to describe different types of object's motion.To alleviate the limitations of mean-shift,the rough position of the target can be first obtained from the difference among frames [4],then the mean-shift is initialised to the estimated rough position for precise tracking [5].Authors in Ref.[6] add a motion model to the mean-shift tracker,such that the prediction from the motion model can be used as the initial searching region of the mean-shift at each iteration.However,the mean-shift may fail if the motion model has not converged due to insufficient measurements on objects'velocity.Authors in Ref.[7]have proposed a Kalmanfilter-based framework for 3-D visual pose tracking on rigid objects.Given the wireframe model of the object,the framework estimates the pose by searching the best match between straight-line features from observation and prediction.

Event cameras have been focused recently in tracking,due to its extraordinary performance in environments with fast motion and high dynamic range.Two main streams of approaches,Bayes inference,and optimisation,have both been applied to this area with some achievements.Following the probabilistic inference,the Kalman filter and the particle filter are used for tracking in Refs.[8],[9],and [10],where the measurements are obtained from CNN-based object detection,an event-based modified mean-shift clusterer,and a ‘time-image’-dependent moving object detector,respectively.In Ref.[11],an observation model for circle-shaped targets is built,and a particle filter with a constant position model is proposed for accurate ball tracking at the cost of an increased update rate.As one of the optimisation-based trackers,authors in Ref.[12]augment each event with a feature vector,then the events within the local sliding window are clustered such that the object is abstracted using feature vectors,given the initial spatiotemporal position of the object.In the training phase,a Support Vector Machine (SVM) is trained to classify the presence or absence of the object;In the tracking phase,the best candidate window with the highest value of the discriminant function is deemed as the tracking result.In Ref.[13],an Expectation-Maximisation (EM) algorithm has been used for estimating parameters of a Gaussian Mixture Model (GMM),which models the density of events that have been accumulated in a finite horizon.Authors in Ref.[14] combine the advantages of frames and events by detecting features on frames and tracking them from events.The tracker follows an optimisation framework which minimizes the photometric residual between measurements from events and predictions from the generative model.

Besides above-mentioned approaches,there are some event-based asynchronous trackers where simple and straightforward tracking rules are applied to improve real-time performance.In Ref.[15],a naive tracker has been proposed by mixing the current tracking position with events within the search region.As an improved version of Ref.[15],a feature tracker has been presented in Ref.[16] by assuming that a moving object generates Gaussian-distributed events on 2-D image planes.The tracker has also been used for tracking objects such as human faces[17,18].To detect and track corner events,the eFast detector and a graph-based tracker have been proposed in Ref.[19],with further applications in Ref.[20].

Instead of extracting feature points,this work focuses on object tracking using object-level motion features.Motion features are suitable for object tracking because it is unnecessary to ‘differentiate’ objects with same location and similar velocities.Moreover,fast-moving objects,which require extra attention,usually bring notable motion features.Event cameras inherently output both contours and motion information,thus are ideal for motion feature extraction.The benefits of motion features may help to improve the performance of fast-moving object tracking under occlusion scenarios.

3|PRELIMINARIES

This section provides essentials for the proposed approach for readers'convenience.First,we present the data format of event cameras,focusing on the uniqueness compared with conventional image sensors.Then,the mean-shift,as the optimisation approach in event-based tracking,is reviewed.Next,we show the ‘Surface of Active Events’ (SAE),which is used in local flow calculation.The assumptions and principles for local flow calculation and the statistical distribution of local measurements on a single object are detailed in the next subsection.Finally,the data flow for the proposed trackers is introduced.

3.1|Data from the event camera

FIGURE 2 Data from a conventional image sensor (a) and an event camera (b).Besides image frames,the event camera used herein also outputs asynchronised events that reflect intensity change on each independent pixel.Note that frames and events cannot be obtained simultaneously in a single sensor,and it takes time to convert between output data formats.

Event cameras detect intensity change on each independent pixel.An event would be triggered on a pixel (x,y) at timetonce the absolute intensity change from timetlasttotis larger than a user-defined parameter,wheretlastdenotes the triggering time of last event on (x,y).The events are asynchronised data stream which can be represented as{〈x,y,t〉}1Some other properties (such as the polarity and absolute intensity) could be obtained from events,depending on the manufacturer and configurations..As shown in Figure 2,conventional image sensors only output frames,while both frames and events can be obtained from event cameras.The dual formats of the event cameras support fast,responsive,and flexible processing via events and frames.

3.2|Mean-shift

The mean-shift [21] was originally proposed as a nonparametric clustering approach by moving the cluster centre towards maximum sample density iteratively until convergence,and it has been applied to visual object tracking [22,23].Suppose we have a set ofnpoints with coordinates｛xi｝i=1，…，nind-dimensional Euclidean space Rd.By assuming that x is isotropic,the Kernel Density Estimator (KDE) gives an estimated probabilistic density functionas

with a kernelK(x)that is a symmetric multivariate density,and a bandwidthh.Common kernel functions include multivariate Epanechnikov kernel

wherecd=∫Rd1‖x‖＜1dx is the volume of the unit sphere in Rd,and the Gaussian (normal) kernel

It has been derived in Ref.[22] that the gradient of the density estimate is

is called the mean-shift vector that points to the biasing direction in each iteration;g(·) ＝－k′(·) denotes the negative derivative ofk;k:0，∞）→R,which ensures kernel symmetry by definition,is called the ‘profile’ of kernelKsuch thatK(x)=k(‖x‖2).Essentially,the mean-shift algorithm is to find the mode recursively,given a set of data samples by gradient ascent:

It has been proved that the mean-shift is convergent under particular conditions with respect to the kernel.

3.3|Local motion from events

3.3.1|Surface of active events (SAE)

Conventional Optical Flow (OF) [24] measures the rate of intensity change on the image plane.Event cameras cannot be used directly for OF calculation because there is no synchronised absolute intensity but only ‘events’ and their timestamps coming out sequentially.The‘flow’herein refers to the concept of visual motion flow in the literature,instead of the OF,as the visual motion flow is not derived from intensity variation,but purely defined based on the relation between event coordinates and timestamps.Given a sequence of events,the Surface of Active Events (SAE) Σe:N2→R that maps events' coordinates to timestamps is locally defined,and we have the relation between the gradient of SAE and the flow inxandydirections [25-26]:

3.3.2|Local consistencies

Local intensity consistency assumes that at any time,the neighbourhood of any pixel (x,y) on the image plane shares the same intensity.This approximation holds well especially when the image sensor has higher resolution such that several pixels observe one brightness-constant point.From local intensity consistency,we further assume that those events in the neighbourhood are triggered by the same object or pattern.The assumption underlies all local planes fitting algorithms for event-based flow estimation [25,26].Besides,we assume local motion consistency,that at any time,the neighbourhood of any pixel(x,y)on the image plane shares the same flow(u(x,y),v(x,y)).This is originated from the assumption that ‘OF is locally smooth’,and it has been implemented earlier in conventional flow estimation approaches such as the Lucas-Kanade method [27].Herein,the consistency is applied to acquire enough measurements while reducing the influence of noises.The illustration of these assumptions is shown in Figure 3.

3.3.3|Local flow calculation

Suppose we take any neighbourhood on SAE with timestampt(x,y)for pixel(x,y)on the image plane.Ideally,the local flow can be computed as.

FIGURE 3 Local intensity consistency and local motion consistency(left),and the ‘same trigger pattern’ assumption (right)

for any two pixels(x1,y1)and(x2,y2)in this neighbourhood.In this work,we take 3 × 3 square neighbourhoods,where the 4-connected pixels of the centre are considered in flow calculation.The above approach leads to a local and sparse solution.In other words,there is no prior term that favours global properties (e.g.smoothness) during the estimation,nor any estimation can be obtained for those pixels without enough qualified measurements.

3.4|Distribution of local time measurements

This work takes the ‘constant velocity’ assumption.In other words,any object is assumed to have a constant flow in a short period of time.This subsection derives the distribution of local timestamp measurements from which the object flow can be estimated.

In practice,it is essential to consider state uncertainty and measurement uncertainty in flow estimation.For simplicity,we use Δtand Δxto represent time interval and distance between any two pixels,respectively,either inxorydirection.To avoid singularity at zero,we model the flow uncertainty by assuming that Δtis normal:

Suppose the measurement noise is zero-mean normal,denoted as N（0）.From the denominators of (9)and(10),we know that the conditional distribution of measured time interval is also normalafter a linear transformation.Now we aim to find the unconditional distribution of local measurements Δ～t,given parametersμΔt,σΔt,andσt.

Letf(·)denote the probability density function of the random variable (·),we have.

3.5|Overview of the proposed approaches

FIGURE 4 The proposed object tracking approaches include an event-based tracker and a frame-based tracker.Both trackers utilize motion information that can be obtained easily from event cameras.On the left side,histograms of local time measurements are created as features of target objects;On the right side,instead of histogram representation,statistical characteristics (namely mean and variance) of local flows are used to estimate velocities of objects.The object detector in the middle provides bounding boxes for both trackers to establish objects' motion profile

As shown in Figure 4,both frames and events are used in the proposed tracker.A frame-based object detection module in Ref.[28]is required in this work to initialize the tracker and to improve tracking stability.On one hand,events between frames are assigned to objects,whose bounding boxes are from the current estimated ones.An event-based variant of the mean-shift tracker,which utilizes the motion histogram to differentiate objects,is proposed to update bounding boxes'position asynchronously.On the other hand,events are also used to compute the local flows that describe pixel-level motion on the image plane.Combined with bounding boxes provided by object detection,object-level velocities can be obtained from parameter estimation detailed in Section 3.4.We then feed the object-level velocity into a Kalman-type ensemble to fuse multi-source information including the motion model,measurements from detector and object velocity estimation.In the next two sections,the details of each module will be presented.

4|ASYNCHRONISED EVENT-BASED TRACKING

The overall description of the event-based tracking is shown in Algorithm 1.In this algorithm,we build our tracker based on the existing mean-shift framework,with the newly proposed motion feature to describe objects.First,object detection results are used to set the initial propertiesξ0of objects,mainly the bounding boxes.Then,AssignEventToObjectassigns each received event to an object if the event is within the object's current estimated bounding box.The corresponding SAE is updated inUpdate-SAEby Σe(x,y)=t.For each object that contains the current event,we try to initialize it as a target bycreating the histogram of local time measurements (or ‘motion histograms’) inInitializeTarget.If initialisation has been done,MeanShiftTrackis executed to move the bounding box such that the similarity between the target and the tracked object on motion histograms is maximised.The initialisation and tracking process in the algorithm are detailed in the next two subsections.

Algorithm 1 Asynchronized Event-based Tracking

4.1|Target initialisation

Once an event is received,InitializationDonewill check if the corresponding object has been initialised or not.We try to initialise an object as a target if 1) the object is currently not initialised;and 2) enough events have been accumulated to generate the motion histogram.

4.1.1|Events accumulation

Adequate events are required to compute enough local flows such that the feature of motion for objects can be fully extracted.Given a bounding box,we count the number of events that are assigned to this object.Once the number of events reaches a user-defined threshold,the approach continues to extract features to build a target model.

4.1.2|Target representation

Since events provide motion information,we propose to use the distribution of time measurements Δtin (11) instead of the colour [23] as objects' features.Them-bin histograms of the target model q and the target candidate p(y) are written as.

whereudenotes the bin index;quandpu(y) are normalised frequencies for the target model and the target candidate centred at y,respectively.

In particular,let us take the horizontal time interval as an example.Given a patch centred at 0 containingnhorizontal time measurements Δtu(xi),wherei=1,…,nand xidenotes pixel coordinates ofith measurement,we can computequfrom

wherek(·) denotes the kernel;δiurepresents the Kronecker delta:If Δtu(xi) belongs to theu-th bin,δiu=1,otherwiseδiu=0.Similarly,consider another patch centred at y withn′horizontal time measurements,pu(y) can be computed from

where the bandwidthhcontrols the size of patch in computing target candidates' histograms.

4.2|Similarity maximisation on motion histograms

InMeanShiftTrack,we use the popular Bhattacharyya coefficient [29] as the similarity metric between p(y) and q:

The target candidate localisation is an iterative process,which shifts the target candidate centre y such that (19) converges to the maximum.By omitting the derivation,the explicit iteration can be written as [23].

where weightswi=andg(·)=-k′(·).We select Epanechnikov kernel so thatg(·)=1 in this work.

To accelerate implementation,there is no event generative model [30,31] established in this paper.In other words,it is assumed that the object always produces a fixed spatiotemporal pattern with respect to its motion only.

5|FLOW-ASSISTED INTER-FRAME TRACKING

As shown in the right side of Figure 4,we build the inter-frame tracker based on Ref.[28] to further enhance the tracking performance on fast-moving objects.The events between two frames are collected for object-level velocity estimation.Theobtained object velocities,or flows,are then used as additional measurements in the Kalman ensemble to enable a more responsive tracker.The pseudocode is shown in Algorithm 2,where we first need to initialize structure arraytracksand Kalman filter parameters.Then in each iteration,PredictKalmandenotes the ‘prediction’ step;AssignHungarianassigns object detection results to existing tracks using Hungarian assignment algorithm [32];EstimateFlowprovides object-level flow estimation as an additional measurement zfin Kalman filter;The assigned detection results zdand corresponding object-level flow zfare used inCorrectKalmanas the ‘correction’ step;Finally,MaintainTrackscreates new tracks and deletes obsolete ones.As core procedures,the flow estimation and Kalman filter framework are elaborated in the subsections below.

Algorithm 2 Flow-Assisted Inter-Frame Tracking

5.1|Object-level flow estimation

Local flows need to be selected in order for a robust objectlevel flow estimation.To reduce unreliable local flow measurements while minimizing the computational load,neighbourhoods with untriggered pixels in a user-defined time interval,or with inconsistent gradients on SAE are not considered in flow estimation.Particularly,suppose we have obtained the SAE Σe(x,y) in a neighbourhood?.The flow at the centre of?is retained if ?(x,y) ∈?,the following conditions (21),(22) and (23) hold:

wheretdenotes the current timestamp;τis the user-defined time interval;Σe(x,y)represents the trigger time at pixel(x,y).Equation (21) represents a time-spatial neighbourhood for local flow calculation.Equations (22) and (23) ensure gradient consistency of the local SAEs so that noisy timestamps are discarded.

For each bounding box provided by the offline-trained object detector,all selected local flows within its boundary are collected for object-level flow estimation,as detailed in Section 3.4.To exclude poor estimation,a decision whether the object-level flow is accepted would be made based on estimated mean and variance.Figure 5 shows an example of flow estimation results,where the histograms represent the distribution of local measurements in horizontal and vertical directions,while the curves are generated based on estimated mean and variance of the object-level flow.In this example,the vertical flow is not fed into the Kalman filter due to the large variance.

5.2|Kalman inter-frame tracker

FIGURE 5 Measurement distribution on a single object.(a) The SAE of the object with a bounding box.The 16-bit image shows the relative time of triggered events,and brighter pixels are triggered later.(b)Local measurement distributions and fitted curves in x and y directions.Parameters are estimated as:mean Δtu=1.32×104,Δtv=-2.54×103;Total standard deviation(including both state and measurement uncertainties)are 5.69×103 and 1.31×104 for x and y directions.The y-direction result (b-2) is abandoned due to the large standard deviation

Once the flow is obtained for each detected object,an interframe tracking framework is proposed based on a Kalman filter,in which properties of bounding boxes and object-level flows are deemed as measurements.Since the main structure of the inter-frame tracker has been used in our previous work[28],we present the essential part for readers' convenience.By defining the state vector as

which includes bounding boxes central coordinatesx,y,velocitiesu,v,the widthw,the change rate of widthvw,and the ratio of width to heights,the constant velocity process equation are written as follows.

wherewkis the process noise with covariance matrixQk,and

with time interval Δtk.The measurement model is expressed as

in whichvkis the measurement noise with covariance matrixRk;zdkandzfkcome from object detector and object-level flow estimator,respectively.The ideal measurement equation is

where Δxand Δyare pre-defined parameters that measure neighbourhood's width and height in flow estimation.An Extended Kalman Filter (EKF) [33] is implemented based on the proposed process and measurement models (25) and (26).

6|EXPERIMENTS

In this section,evaluations are presented to validate the performance of the proposed tracking approaches.First,we show the case of event-based tracking on fast-moving objects.Comparative results to a classic event-based tracker and its variant,which is modified from Refs.[15]and[16],respectively,are provided.Second,the inter-frame tracker is compared with the tracker without flow assistance.As we focus on tracking fast-moving objects in this work,two specially collected datasets are selected for evaluation,as shown in Figure 6:InCrossingObjects,balls (inCrossingObjects-Balls) and model vehicles (inCrossingObjects-Cars) move fast across the image in indoor simulated environments.InPedestrianCutIn,data have been recorded on a moving car,and the detection results are from our pre-trained YOLO-based detector.Since the object velocity on the image plane inCrossingObjectsis much larger than inPedestrianCutIn,the event-based tracker is tested inCrossingObjects,while the inter-frame tracker is evaluated inPedestrianCutIn.All data are collected using the CeleX5 Sensor with resolution 1280 × 800.

The algorithm runs in MATLAB R2018b on a mobile workstation with an Intel Core i5-8250U CPU and 8 GB memory.The code for asynchronised event-based tracking is modified based on the original mean-shift algorithm at https://github.com/Singingkettle/MeanShift.The code for flow-assisted inter-frame tracking is modified from our previous work [28].Besides figures herein,more results have been made available online at https://youtu.be/GYW3wJsa1Wo.

6.1|Asynchronised event-based tracking

FIGURE 6 Evaluation dataset (see https://youtu.be/GYW3wJsa1Wo for a detailed evaluation video).Left:CrossingObjects-Balls records events-only data with crossing balls.Middle: CrossingObjects-Cars records the scenario with two crossing model cars.Right:PedestrianCutIn records both events and frames from a moving vehicle,but events are not displayed in this figure.The red and cyan boxes represent detection and tracking results (with the label and estimated horizontal velocity),respectively.The green box is predicted from the last-time detection according to the flow-estimated velocity (shown as green numbers)

FIGURE 7 Mean-shift weights for CrossingObjects-Balls when objects are close.(a)Magnified SAE for two balls,moving from left to right(yellow line box)and from right to left(green dotted line box),respectively.All triggered pixels share the same intensity in this figure.(b)Weights inside the yellow box(b-1)and the green box (b-2)

The left and middle subfigures in Figure 6 show the performance of asynchronised event-based tracking.It is noted that inCrossingObject-Balls,the tracker follows objects even the ball with green box is partially invisible.Figure 7 further illustrates the mean-shift weights within the tracking bounding boxes.The difference in weights validates the effectiveness of the proposed motion histogram features.InCrossingObject-Cars,although the tracking errors become larger for both cars,the trackers are not lost during the entire process.This issue is caused by multiple reasons:1) The relatively larger bounding boxes and slower velocity lead to longer crossing time,in which the front object may be affected while the object in the back is partially invisible.Since no motion prediction is implemented in experiments,the tracker cannot handle the situation where the object is completely invisible.2) The complex texture on the cars may increase the error between the event flow and OF,thus it would be more difficult to extract accurate motion information from local event flows as reliable features.

We first compare the proposed approach with a naive event-based tracker.The naive event-based tracker updates object's centre once an event with coordinateis arrived within the bounding box,based on the following rule:

where 0 ＜α＜1 is the mixing factor.It is noted that the naive tracker fails when there is occlusion or overlap between objects.Furthermore,the mixing factor needs to be tuned in practice,as a largeαmay cause unstableness while a smallαmay decrease the sensitivity to movement.

We then implement an improved event-based Gaussian tracker [16,18] for comparison.This improved Gaussian tracker keeps the mean and covariance matrix of an object.The update laws are as follows:

where the covariance difference ΔΣ is computed from the current tracker's position and event's position.InCrossingObjects-BallsandCrossingObjects-Cars,the parameters are set asα1=0.9,α2=0.99 andα1=0.99,α2=0.999,respectively.It is observed that the Gaussian tracker suffers from the similar problem under the circumstance of occlusion.Moreover,we found that the estimation on covariance matrices was influenced by the order of events,which,requires the events to be increasing-by-time locally.Unfortunately,the data reading mechanism of the event camera results in data with globally increased timestamps.In other words,sorting is necessary to unleash the potential of the Gaussian tracker.As a comparison,both the naive tracker and the proposed approach is less affected by data sequence.Events with globally increased timestamps are enough to achieve the demonstrated performance.

Although we only show results with two moving objects,behaviours of the proposed algorithm can be predicted for tracking more objects with partial occlusion.Since there is no implementation of feature adaptation,targets' motion profiles do not change after initialisation.If the constant velocity assumption still holds for all objects,and the histograms of time measurements are distinct enough during initialisation,it is of a high probability that the proposed event-based tracker will still work,similar to the results shown in this section.It is noted that the proposed tracker may not deal with the situation where 1) objects' velocities are changing rapidly,and 2)the area of occlusion is too large such that the object hidden behind is unable to re-acquire the bounding box after crossing.

The intervals of update frequency 2 are set to 10,000 events forCrossingObjects-Ballsand 100,000 events forCrossingObjects-Cars.The above parameters will influence computational efficiency and tracking performance:For a set of events,a large interval results in faster calculation but may fail in tracking especially for fast-moving objects,while a small interval may be not enough for event accumulation to extract accurate motion features.Specifically,it takes 12 s to process 562,831 events forCrossingObjects-Balls,while processing 4,039,059 events forCrossingObjects-Carsrequires 105s.In comparison,the naive approach takes 7 and 75 s to process the two datasets.We list the average processing speed in Table 1,where time consumption for shared modules such as display,data read-in,and event-object assignment have not been considered.

TABLE 1 Processing speed of asynchronised event-based trackers

6.2|Flow-assisted inter-frame tracking

The full testing sequence ofPedestrianCutInand tracking results can be found in the online video.The right subfigure in Figure 6 shows the performance of flow-assisted inter-frame tracking on fast-moving objects.It is noted that the tracker follows the pedestrian soon after detection,and it runs properly during the entire cut-in process even the detection results or flow estimation are occasionally unavailable.Figure 8 further demonstrates horizontal velocity estimation results on the pedestrian.With the help of event flows,the velocity state components in EKF become observable,thus better estimation performance is expected.In particular,the proposed tracker outputs object's horizontal velocity once it is detected.The prompt results are essential in increasing the safety margin of ADAS in extreme scenarios.

It takes 75 ms to process one SAE frame on our testing platform.Among all modules,the local flow calculation and object-level velocity estimation take 91.5%of the total time.In particular,extracting timestamps from SAE and local time calculation are simple operations but with an extremely large number of executions;The time consumption in distribution fitting of local time measurements depends on how many measurements we obtained:lack of measurements will accelerate calculation but may reduce the reliability of fitting accuracy.

Herein,we set several thresholds to increase the robustness of flow estimation.Estimation results from scattered measurements (with large sample variances) would be discarded.Small object velocities are also deemed as invalid because there is insufficient reason that a slowly moving object requires fasttracking performance.

7|CONCLUSION AND FUTURE WORK

Herein,the motion information obtained from event cameras has been extracted and used for object tracking.Events have been accumulated to generate motion histograms,which are used as features to distinguish objects with different motion profiles.An asynchronised event-based tracker has been proposed based on maximizing the similarity between target and candidate features.The statistical characteristics of the motion histograms have been analysed,and estimation on object-level velocity has been obtained such that the performance of frame-based tracker can be improved especially on fast-moving objects.Experiments with self-collected data have demonstrated that the motion features from event cameras are effective,as Supporting Information beyond conventional colour or texture features.

FIGURE 8 Object velocity estimation results in PedestrianCutIn.The ground truth velocity is computed from manually labelled points on grayscale frames with timestamps.The object-level flow estimation results are fed into the proposed Kalman filter,whose output velocities are recorded as shown in yellow and purple curves,for filters with and without flow-assistance,respectively

The event generative model has not been considered in flow calculation.To further enhance the performance of flow estimation,frames with intensity values could be used to remove the influence from backgrounds such that the proposed approach is adaptive to more challenging environments.Moreover,a motion model could be added in asynchronised event-based tracking to improve the performance under longer-time occlusion.

ORCID

Rui Jianghttps://orcid.org/0000-0003-0966-2943

CAAI Transactions on Intelligence Technology2021年2期

CAAI Transactions on Intelligence Technology的其它文章: Studies on situation reasoning approach of autonomous underwater vehicle under uncertain environment; Selective kernel networks for weakly supervised relation extraction; Binaural sound source localization based on weighted template matching; Learning-based control for discrete-time constrained nonzero-sum games; Protecting artificial intelligence IPs:a survey of watermarking and fingerprinting for machine learning; Why AI still doesn’t have consciousness?

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡