News （课题组最近的消息）

Limin Wang

Limin Wang (王利民)
Multimedia Computing Group
Department of Computer Science and Technology
Nanjing University
Office: CS Building 506
Email: lmwang.nju [at] gmail.com

About Me (CV)

I am a Professor at Department of Computer Science and Technology and also affiliated with State Key Laboratory for Novel Software Technology, Nanjing University.

Previously, I received the B.S. degree from Nanjing University in 2011, and the Ph.D. degree from The Chinese University of Hong Kong under the supervision of Prof. Xiaoou Tang in 2015. From 2015 to 2018, I was a Post-Doctoral Researcher with Prof. Luc Van Gool in the Computer Vision Laboratory (CVL) at ETH Zurich.

News

2022-03-02: Seven papers on object detection, object tracking, action recognition etc. are accepted by CVPR 2022.
2021-07-25: Five papers on video understanding are accepted by ICCV 2021: new dataset (MultiSports), backbone (TAM), sampling method (MGSampler), detection frameworks (RTD and TRACE). For more details, please refer to our papers.
2021-07-15: We release the MultiSports dataset for spatiotemporal action detection. </li>
2021-07-15: Our team secures the first place at ACM MM Pre-training for Video Understanding Challenge for Track 2.
2021-06-15: Our team secures the first place at CVPR Kinetics Challenge for Self-Supervised Task.
2021-06-15: Our team secures the first place at CVPR PIC Challenge for Human-Centric Spatio-Temporal Video Grounding Task.
2021-06-01: We are organizing DeeperAction Challenge at ICCV 2021, by introducing three new benchmarks on temporal action localization, spatiotemporal action detection, and part-level action parsing.
2021-04-20: The extension of TRecgNet is accpeted by IJCV.
2021-04-07: We propose a target transformer for accurate anchor-free tracking, termed as TREG (code comming soon).
2021-04-07: We present a transformer decoder for direct action proposal generation, termed as RTD-Net (code comming soon).
2021-03-01: Two papers on action recognition and point cloud segmentation are accepted by CVPR 2021.
2020-12-30: We propose a new video architecture of using temporal difference, termed as TDN and realease the code.
2020-07-03: Three papers on action detection and segmentation are accepted by ECCV 2020.
2020-06-28: Our proposed DSN, a dynamic version of TSN for efficient action recognition, is accepted by TIP.
2020-05-14: We propose a temporal adaptive module for video recognition, termed as TAM and code.
2020-04-16: The code of our published papers will be made available at Github: MCG-NJU.
2020-04-16: We propose a fully convolutional online tracking framwork, termed as FCOT and code.
2020-03-10: Our proposed temporal module TEA is accepted by CVPR 2020.
2020-01-20: We propose an efficient video representation learning framwork, termed as CPD and release the code.
2020-01-15: We present an anchor-free action tubelet detector, termed as MOC-Detector and release the code.
2019-12-20: Our proposed V4D, a principled video-level represenation learning framework, is accepted by ICLR 2020.
2019-11-21: Our proposed TEINet, an efficient video archiecture for video recognition, is accepted by AAAI 2020.
2019-07-23: Our proposed LIP, a general alternative to average or max pooling, is accepted by ICCV 2019.
2019-03-15: Two papers are accepted by CVPR 2019: one for group activity recognition and one for RGB-D transfer learning.
2018-08-19: One paper is accepted by ECCV 2018 and one by T-PAMI.
2018-04-01: I join Nanjing University as a faculty member at Department of Computer Science and Technology.
2017-11-28: We released a recent work on video architecture design for spatiotemporal feature learning. [ arXiv ] [ Code ].
2017-09-08: We have released the TSN models learned in the Kinetics dataset. These models could be transferred well to the existing datasets for action recognition and detection [ Link ].
2017-09-01: One paper is accepted by ICCV 2017 and one by IJCV.
2017-07-18: I am invited to give a talk at the Workshop on Frontiers of Video Technology-2017 [ Slide ].
2017-03-28: I am co-organizing the CVPR2017 workshop and challenge on Visual Understanding by Learning from Web Data. For more details, please see the workshop page and challenge page.
2017-02-28: Two papers are accepted by CVPR 2017.
2016-12-20: We release the code and models for SR-CNN paper [ Code ].
2016-10-05: We release the code and models for Places2 scene recognition challenge [ arXiv ] [ Code ].
2016-08-03: Code and model of Temporal Segment Networks is released [ arXiv ] [ Code ].
2016-07-15: One paper is accepted by ECCV 2016 and one by BMVC 2016.
2016-06-16: Our team secures the 1st place for untrimmed video classification at ActivityNet Challenge 2016 [ Result ].
Basically, our solution is based on our works of Temporal Segment Networks (TSN) and Trajectory-pooled Deep-convolutional Descriptors (TDD).
2016-03-01: Two papers are accepted by CVPR 2016.
2015-12-10: Our SIAT_MMLAB team secures the 2nd place for scene recognition at ILSVRC 2015 [ Result ].
2015-09-30: We rank 3rd for cultural event recognition on ChaLearn Looking at People challenge, at ICCV 2015.
2015-08-07: We release the Places205-VGGNet models [ Link ].
2015-07-22: Code of Trajectory-Pooled Deep-onvolutional Descriptors (TDD) is released [ Link ].
2015-07-15: Very deep two stream ConvNets are proposed for action recognition [ Link ].
2015-03-15: We are the 1st winner of both tracks for action recognition and cultural event recognition, on ChaLearn Looking at People Challenge at CVPR 2015.

Selected Publications [ Full List ] [ Google Scholar ] [ Github: MCG-NJU ]

Target Transformed Regression for Accurate Tracking
Y. Cui, C. Jiang, L. Wang, G. Wu
Technical Report, 2021.
[ Paper ] [ Code ]
Transformer for anchor-free tracking with obtaining SOTA performance

Fully Convolutional Online Tracking
Y. Cui, C. Jiang, L. Wang, G. Wu
Technical Report, 2020.
[ Paper ] [ Code ]
Online learning of both classification and regression branch in a fully convolutional manner.

Learning Spatiotemporal Features via Video and Text Pair Discrimination
T. Li, L. Wang
Technical Report, 2020.
[ Paper ] [ Code ]
We propose a weakly supervised video representation learning framework from text information.

3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
H. Zhang, Y. Tian, X. Zhou, W. Ouyang, Y. Liu, L. Wang, Z. Sun
in IEEE International Conference on Computer Vision (ICCV), 2021.
[ Paper ] [ Code ] [ Project Page ]

MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Y. Zhi, Z. Tong, L. Wang, G. Wu
in IEEE International Conference on Computer Vision (ICCV), 2021.
[ Paper ] [ Code (soon) ]
A simple, general, and explainable video sampling method.

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Y. Li, L. Chen, R. He, Z. Wang, G. Wu, L. Wang
in IEEE International Conference on Computer Vision (ICCV), 2021.
[ Paper ] [ Data ] [ Code ] [ Challenge ]
A high-quality and fine-grained action detection benchmark.

TAM: Temporal Adaptive Module for Video Recognition
Z. Liu, L. Wang, W. Wu, C. Qian, T. Lu
in IEEE International Conference on Computer Vision (ICCV), 2021.
[ Paper ] [ Code ]
Temporal adaptive module of self attention + dynamic filtering for video recognition.

Relaxed Transformer Decoders for Direct Action Proposal Generation
J. Tan, J. Tang, L. Wang, G. Wu
in IEEE International Conference on Computer Vision (ICCV), 2021.
[ Paper ] [ Code ]
Transformer for direct action proposal generation

Cross-Modal Pyramid Translation for RGB-D Scene Recognition
in International Journal of Computer Vision (IJCV), in IJCV, 2021.
[ Paper ] [ Code ]
Journal extension of TRecgNet with pyramid translation extension.

TDN: Temporal Difference Networks for Efficient Action Recognition
L. Wang, Z. Tong, B. Ji, G. Wu
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[ Paper ] [ Code ]
Temporal modeling with an explicit difference operation.

Boundary-Aware Cascade Networks for Temporal Action Segmentation
Z. Wang, Z. Gao, L. Wang, Z. Li, and G. Wu
in European Conference on Computer Vision (ECCV), 2020.
[ Paper ] [ Code ]
SOTA performance for action segmentation on three benchmarks.

Context-Aware RCNN: a Baseline for Action Detection in Videos
J. Wu, Z. Kuang, L. Wang, W. Zhang, G. Wu
in European Conference on Computer Vision (ECCV), 2020.
[ Paper ] [ Code ]
A simple baseline for action detection in videos.

Actions as Moving Points
Y. Li, Z. Wang, L. Wang, G. Wu
in European Conference on Computer Vision (ECCV), 2020.
[ Paper ] [ Code ]
MOC-detector is an anchor-free action tubelet detector, obtaining SOTA on JHMDB and UCF.

Dynamic Sampling Networks for Efficient Action Recognition in Videos
Y. Zheng, Z. Liu, T. Lu, L. Wang
in IEEE Transactions on Image Processing (TIP), 2020.
[ Paper ]
A dynamic version of TSN for efficient action recognition.

V4D: 4D Convolutional Neural Networks for Video-Level Representation Learning
S. Zhang, S. Guo, W. Huang, M. Scott, L. Wang
in International Conference on Learning Representations (ICLR), 2020.
[ Paper ] [ Code ]
V4D is an extension over TSN for video-level representation learning.

TEA: Temporal Excitation and Aggregation for Action Recognition
Y. Li, B. Ji, X. Shi, J. Zhang, B. Kang, L. Wang
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[ Paper ] [ Code ]
We propose a lightweight temporal module for video recognition.

TEINet: Towards an Efficient Architecture for Video Recognition
Z. Liu, D. Luo, Y. Wang, L. Wang, Y. Tai, C. Wang, J. Li, F. Huang, T. Lu
in AAAI Conference on Artificial Intelligence (AAAI), 2020.
[ Paper ]
An efficient architecture for video recognition based on 2D CNN.

LIP: Local Importance-based Pooling
Z. Gao, L. Wang and G. Wu
in IEEE International Conference on Computer Vision (ICCV), 2019.
[ Paper ] [ Code ]
A general downsampling alternative to max or average pooling.

Learning Actor Relation Graphs for Group Activity Recognition
J. Wang, L. Wang, L. Wang, J. Guo and G. Wu
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[ Paper ] [ Code ]
Obtaining STOA performance on datasets of Volleyball and Collective Activity.

Translate-to-Recognize Networks for RGB-D Scene Recognition
D. Du, L. Wang, H. Wang, K. Zhao and G. Wu
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[ Paper ] [ Code ] [ Project Page ]
A new cross-modal transfer framework for RGB-D scene recognition.

Appearance-and-Relation Networks for Video Classification
L. Wang, W. Li, W. Li, and L. Van Gool
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ Paper ] [ Code ]
A new architecture for spatiotemporal feature learning.

Transferring Deep Object and Scene Representations for Event Recognition in Still Images
L. Wang, Z. Wang, Y. Qiao, and L. Van Gool
in International Journal of Computer Vision (IJCV), 2018.
[ Paper ] [ Code ]
STOA performance for event recognition on ChaLearn LAP cultural event, WIDER datasets.

Temporal Action Detection with Structured Segment Networks
Y. Zhao, Y. Xiong, L. Wang, Z. Wu, X. Tang, and D. Lin
in IEEE International Conference on Computer Vision (ICCV), 2017.
[ Paper ] [ Code ]
A new framework for temporal action localization.

UntrimmedNets for Weakly Supervised Action Recognition and Detection
L. Wang, Y. Xiong, D. Lin, and L. Van Gool
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ Paper ] [ BibTex ][ Code ]
An end-to-end architecture to learn from untrimmed videos.

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos
J. Song, L. Wang, L. Van Gool, and O. Hilliges
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ Paper ] [ BibTex ][ Project Page ]
End-to-end learning of FCNs and spatio-temporal relational models.

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs
L. Wang, S. Guo, W. Huang, Y. Xiong, and Y. Qiao
in IEEE Transactions on Image Processing (TIP), 2017.
[ arXiv ] [ BibTex ] [ Code ]
Solution to Places2 and LSUN challenge.

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
Z. Wang, L. Wang, Y. Wang, B. Zhang, and Y. Qiao
in IEEE Transactions on Image Processing, 2017.
[ arXiv ] [ BibTex ] [ Code ]
A hybrid representation combing deep networks and Fisher vector.

Two-Stream SR-CNNs for Action Recognition in Videos
Y. Wang, J. Song, L. Wang, O. Hilliges, and L. Van Gool
in British Machine Vision Conference (BMVC), 2016.
[ Paper ] [ BibTex ] [ Code ]
Explicitly incorporating human and object cues for action recognition

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool
in European Conference on Computer Vision (ECCV), 2016.
[ Paper ] [ BibTex ] [ Poster ] [ Code ] [ Journal Version]
Proposing a segmental architecture and obtaining the state-of-the-art performance on UCF101 and HMDB51

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016
Y. Xiong, L. Wang, Z. Wang, B. Zhang, H. Song, W. Li, D. Lin, Y. Qiao, L. Van Gool, and X. Tang
ActivityNet Large Scale Activity Recognition Challenge, in conjuction with CVPR, 2016.
[ Paper ] [ BibTex ] [ Presentation ] [ Code ]
Winner of ActivityNet challenge for untrimmed video classification

Actionness Estimation Using Hybrid Fully Convolutional Networks
L. Wang, Y. Qiao, X. Tang, and L. Van Gool
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[ Paper ] [ BibTex ] [ Poster ] [ Project Page ] [ Code ]
Estimating actionness maps and generating action proposals

Real-time Action Recognition with Enhanced Motion Vector CNNs
B. Zhang, L. Wang, Z. Wang, Y. Qiao, and H. Wang
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[ Paper ] [ BibTex ] [ Poster ] [ Project Page ] [ Code ]
Proposing a real-time action recognition system with two-stream CNNs.

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors
L. Wang, Y. Qiao, and X. Tang
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[ Paper ] [ BibTex ] [ Extended Abstract ] [ Poster ] [ Project Page ] [ Code ]
State-of-the-art performance: HMDB51: 65.9%, UCF101: 91.5%. </div>

</div> </div> </div>

Contests

ActivityNet Large Scale Activity Recognition Challenge, 2016: Untrimmed Video Classification, Rank: 1/24.
ImageNet Large Scale Visual Recognition Challenge, 2015: Scene Recognition, Rank: 2/25.
ChaLearn Looking at People Challenge, 2015, Rank: 1/6
THUMOS Action Recognition Challenge, 2015, Rank: 5/11.
ChaLearn Looking at People Challenge, 2014 , Rank: 1/6, 4/17.
THUMOS Action Recognition Challenge, 2014, Rank: 4/14, 2/3.
ChaLearn Multi-Modal Gesture Recognition Challenge, 2013 , Rank: 4/54.
THUMOS Action Recognition Challenge, 2013, Rank: 4/16.

Academic Service

Journal Reviewer

IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Image Processing

IEEE Transactions on Multimedia

IEEE Transactions on Circuits and Systems for Video Technology

Pattern Recognition

Pattern Recognition Letter

Image and Vision Computing

Computer Vision and Image Understanding

Conference Reviewer

IEEE Conference on Computer Vision and Pattern Recognition, 2017

IEEE International Conference on Automatic Face and Gesture Recognition, 2017

European Conference on Computer Vision, 2016

Asian Conference on Computer Vision, 2016

International Conference on Pattern Recognition, 2016

Friends

Wen Li (ETH), Jie Song (ETH), Sheng Guo (Malong), Weilin Huang (Malong), Bowen Zhang (USC), Zhe Wang (UCI), Wei Li (Google), Yuanjun Xiong (Amazon), Xiaojiang Peng (SIAT), Zhuowei Cai (Google), Xingxing Wang (NTU)

Last Updated on 24th July, 2021</a>

Published with GitHub Pages

</body> </html>

Hu Zhuhua (胡祝华)