Lifelong Learning of Video Diffusion Models From a Single Video Stream

PLAI group member Jason Yoo and colleagues, under the supervision of Dr. Frank Wood and Dr. Geoff Pleiss, have released a new paper on training autoregressive video diffusion models from a continuous video stream that outputs one video frame at a time. The AI community has long sought models and algorithms that learn in a fundamentally human way; from birth to death, learning as we live. Our paper demonstrates that learning video diffusion models in such a way is not only possible but remarkably can also be competitive with standard offline training approaches given the same number of gradient steps. In addition, our paper introduces three new lifelong video generative modeling datasets generated from synthetic environments of increasing complexity: Lifelong Bouncing Balls, Lifelong 3D Maze, and Lifelong PLAICraft.

Figure 1: Ground truth video frames (top row) and a lifelong learned video diffusion model’s generated video frames (middle and bottom rows) for the *Lifelong PLAICraft* dataset. The model is lifelong learned from a 50-hour Minecraft video using experience replay. Given the 10 initial frames marked by red borders, the model produces the next 20 frames. Despite the model’s limited parameter count of 80 million, the generated videos are diverse and closely resemble Minecraft gameplay.

How Are the Models Lifelong Learned?

In standard offline learning, video diffusion models typically train on independently and identically distributed (i.i.d.) sampled video frames from a large dataset of loosely related videos. In our lifelong learning setup, video diffusion models are trained online on a video stream that sequentially iterates through a single, very long video. At each training iteration, the video diffusion models observe one new video frame and take one gradient step.

The models’ task is to predict the future video frames conditioned on the preceding video frames. Our lifelong learning setup trains the models using a sliding window scheme. At training step t, the model conditions on a fixed number of most recent video frames from the video stream and learns to denoise the subsequent video frames. At training step t+1, the model’s context window slides by one video frame and the same procedure repeats indefinitely. This process is illustrated in the figure below.

Lifelong learning of video diffusion models from a single video stream. At training step t, the model conditions on two frames in the first half of its context window (red) and learns to denoise two frames in the second half of its context window (blue). At training step *t+1*, the model’s context window shifts right by one video frame, and the same procedure repeats indefinitely.

Unsurprisingly, performing SGD on the minibatch solely comprised of the current timestep sliding window video frames leads to suboptimal performance. Therefore, we augment the minibatch with past timestep sliding window video frames that are saved in the replay buffer—a technique commonly known as experience replay. While our paper’s lifelong learning results are based on experience replay, we note that this training setup is compatible with other lifelong learning algorithms.

Datasets and Model Samples

As no prior work has attempted to lifelong learn video models on a continuous video stream, we introduce and experiment with three new video lifelong learning datasets: Lifelong Bouncing Balls, Lifelong 3D Maze, and Lifelong PLAICraft. Each dataset contains over a million video frames derived from a single video and is designed to test how data stream characteristics such as perceptual complexity, frame repetitiveness, rare events, and nonstationarity affect lifelong learning. Our video datasets present novel opportunities to learn video models in learning regimes one step closer to that of biological agents. We now briefly elaborate on each dataset and showcase the lifelong learned video diffusion model samples.

Lifelong Bouncing Balls

Figure 2: Ground truth video frames (top row) and a lifelong learned video diffusion model’s generated video frames
(middle and bottom rows) for the *Lifelong Bouncing Balls* datasets. Given the 10 initial video frames marked by red borders,
the model produces the next 40 frames.

Lifelong Bouncing Balls is the simplest of the three datasets. It contains 1 million 32×32 RGB video frames that depict two colored balls that deterministically bounce around and change colors for 28 hours. There are two versions of the dataset where the ball colors do and do not irreversibly change throughout the video stream to assess the effect of frame detail repetitiveness on video lifelong learning. These two versions are depicted in the left and right subfigures of Figure 2. Video diffusion models lifelong learned with experience replay generate videos with realistic ball motion and color transitions.

Lifelong 3D Maze

Figure 3: Ground truth video frames (top row) and a lifelong learned video diffusion model’s generated video frames
(middle and bottom rows) for the *Lifelong 3D Maze* dataset. Given the 10 initial video frames marked by red borders,
the model produces the next 40 frames.

Lifelong 3D Maze contains 1 million 64×64 RGB video frames that depict a first-person view of an agent that navigates a 3D maze for 14 hours (if the maze feels familiar to you, it is because the maze was one of the Windows 95 screensavers). The maze is randomly generated and contains various sparsely occurring objects such as polyhedral rocks that flip the agent and smiley faces that regenerate the maze. Video diffusion models lifelong learned with experience replay generate realistic maze traversal footages that correctly handle rare events.

Lifelong PLAICraft

Figure 4: Ground truth video frames (top row) and a lifelong learned video diffusion model’s generated video frames
(middle and bottom rows) for the *Lifelong PLAICraft* dataset. Given the 10 initial video frames marked by red borders,
the model produces the next 20 frames.

Lifelong PLAICraft is the most complex of the three datasets. It contains 1.85 million 1280×768 RGB video frames that depict a first-person view of an anonymous player who plays multiplayer Minecraft survival world for 54 hours. The video stream captures continuous play sessions from the PLAICraft project and contains clips featuring various biomes, mining, crafting activities, construction, mob fighting, and player-to-player interactions. Thus, the video stream is highly nonstationary and its characteristics change over time in multiple timescales (ex. day-night cycle vs the player sporadically visiting their home). Video diffusion models lifelong learned with experience replay on the Stable Diffusion-encoded video frames successfully capture perceptual details of the Minecraft video frames, in particular details associated with objects present in every gameplay frame (ex. player name, item bar, equipped item). Interestingly, the model also captures player-like behaviors such as spontaneously opening the user inventory (Figure 1 bottom row, leftmost column) and the in-game chat interface (Figure 4 middle row, rightmost column).

Final Remarks

We are genuinely excited for the future of video model lifelong learning. Our findings show that moderate-sized video diffusion models, lifelong learned on just two days’ worth of video frames, can generate short and plausible videos of challenging environments like Minecraft. Looking ahead, we hypothesize that large video diffusion models lifelong learned on years’ worth of video frames could unlock the ability to generate long and temporally coherent videos of highly complex environments. As video modeling is a key component of many world models such as GameNGen and Oasis, these advancements could pave the way for new life-like approaches to learning, planning, and control in embodied AI agents. We are eager to see where this journey leads and invite you to check out our full paper for additional details and analysis. Thank you for reading!

plaicraft.ai launch

We are proud to announce that UBC’s Behavioral Research Ethics Board has issued a certificate of approval under the minimal risk category for us to publicly release plaicraft.ai, a “free Minecraft in the cloud” generative AI research data collection project. Please consider contributing by signing up and playing Minecraft in your browser at www.plaicraft.ai.

Our audacious but achievable goal is to collect over 10,000 hours of multiplayer Minecraft gameplay and then to use this data to train AGI-like agents that can respond sensibly in video and audio perceptual environments. No more dumb NPCs!

Visual Chain-of-Thought Diffusion Models

Images generated by our baseline, EDM. They mostly look realistic but there are occasionally artifacts – see the blobs on the chin in the first and seventh images.

Images generated by our method. We don’t see any of the artifacts that were present in images from the baseline.

At this year’s CVPR workshop on Generative Models for Computer Vision we’ll present a simple new approach to unconditional and class-conditional image generation. It takes advantage of this fact: conditional diffusion generative models (DGMs) produce much more realistic images than unconditional DGMs. We show in the paper that images produced by conditional DGMs even get more realistic as you condition on more information. This even holds true if you add information by making your text prompt longer for Stable diffusion (see the images we sample from Stable diffusion at the end). If we want to generate a large set of images (or a video), it seems like we have to either (a) start by writing out a detailed description of each image or frame, or (b) accept inferior quality.

Our paper proposes a third option: prompt a first DGM to generate a detailed image description, and then prompt a conditional DGM to generate the image given this detailed description. To avoid the cost of generating long paragraphs of text, we use a vector in the form of a CLIP image embedding for the image description. A CLIP image embedding is a vector that encodes the semantically-meaningful parts of an image in a compact format. Let’s look at some images sampled conditioned on CLIP embeddings:

Animal faces. We sampled every frame in this video using the same set of 20 CLIP embeddings so high-level features (like animal species) are shared across all frames.

Human faces. We sampled every frame in this video using the same set of 20 CLIP embeddings so, once again, high-level features are shared across frames.

We see that two images conditioned on the same CLIP embedding share a lot of the same features: the animals’ species and color patters stay the same; the people’s age, their facial expression, and their accessories stay roughly the same. Even better, these images sampled from a conditional DGM are much more realistic than those sampled from an unconditional DGM: if we take a CLIP embedding of an image from the animal faces dataset and then sample from our conditional DGM, the resulting image is on average 56% more realistic than if we’d used an unconditional DGM (according to the Fréchet Inception Distance, a commonly-used measure of image quality).

Now, how well does the conditional DGM work as part of our proposed method, when we prompt it with a CLIP embedding generated by a second DGM? We find that our generated images are still 48% more realistic than those from an unconditional DGM, almost as good as when we “cheated” by taking CLIP embeddings of dataset images. In summary, even though our task is unconditional generation, we can make use of conditional DGMs, which typically make better-looking images than unconditional DGMs!

We’re excited about future work following this direction. It’s likely that there are better quantities to condition on than CLIP embeddings – we have so far tried just a couple of alternatives. We might even be able to learn an embedder directly to maximize our image quality – doing so could lead us to a generalization of Variational Diffusion Models. We could also condition on multiple quantities – perhaps a future state-of-the-art generative model will consist of a “chain” of DGMs, each conditioning on the output of the one before. If this sounds too complex, an alternative is to simplify our method by learning a single DGM that jointly generates an image and CLIP embedding. See our paper for the full details!

An unrealistic image featuring a distorted road. We prompted Stable diffusion to generate “Aerial photography.“

A more realistic image. We prompted Stable diffusion to generate “Aerial photography of a patchwork of small green fields separated by brown dirt tracks between them. A large tarmac road passes through the scene from left to right.“

Graphically Structured Diffusion Models

Christian Weilbach and Will Harvey, under the supervision of Dr. Frank Wood (PLAI group), have just released a paper on a new deep generative framework to learn structured diffusion models. In contrast to our approach, picture a traditional algorithm design that requires careful mathematical reasoning and a precise implementation in light of numerical and/or combinatorial knowledge. Our new framework instead leverages the universal framework of amortized inference to learn an approximate algorithm while gradually allowing incorporation of such knowledge as side information. Traditional amortized inference uses only joint examples of inputs and outputs. While this sometimes can work on our problems, we have found the incorporation of structural knowledge of computation to be very beneficial both in terms of sample efficiency and ability to generalize on problem size. The following figure provides a high-level overview of our framework,

An application of our framework to binary-continuous matrix factorization.

In the first panel the computational graph of the multiplication of the continuous matrix A and the binary matrix R is expanded as a probabilistic graphical model in which intermediate products C are summed to give E = AR. This graph is used to create a structured attention mask M, in which we highlight 1’s with the color of the corresponding graphical model edge and self-edges in white. In the third panel the projection into the sparsely-structured neural network guiding the diffusion process is illustrated. In the bottom the translation of permutation invariances of the probability distribution into the embeddings is shown. You can find the details of our method in Section 3 of the paper.

Let’s take a look at the resulting dynamics trained to solve Sudokus, to perform binary continuous matrix factorization and to sort. Here you can see our Sudoku solver (trained without access to any existing solver):

If you have solved Sudokus you know that you need to satisfy the constraints that each row, each column and each 3×3 block must contain all numbers from 1 to 9. We have visualized violations of these constraints in red for each row, column and block. On a first glance at the video you can see that the red is gradually being reduced from beginning to the end of the process until both Sudokus are solved. If you take a closer look you can see which parts of the Sudoku are solved first and which parts superimpose the most constraints (are most red). Compared to common solution strategies our solver inherits the stochastic nature of the underlying denoising diffusion process, which yields a more gradual trial and error behaviour than deterministic solvers, but nonetheless steadily moves towards the solution. This also means that the algorithm provides sample diversity as can be seen in our paper (Figure 8). Given several restarts our algorithm solved all 100 Sudokus with that we tested on, each of which had 16 clues given.

We also trained an algorithm for varying rank binary matrix factorization, i.e. the factorization of a matrix E into a binary matrix R and a continuous matrix A. In the video you can see the solution of a problem instance of rank 3 factorization for 5 continuous dimensions and 10 binary dimensions. The process again starts with a noisy solution and gradually refines it into the right binary structure to select the proper rows of A for each row of E. The binary structure converges quicker, while the values of A are still refined until the end of the algorithm execution.

To cover a different class of algorithmic problems we also trained a sorting algorithm. Here you can see it sorting a list of 30 elements.

By incorporating both combinatorially hard and mixed continuous discrete problems and being able to address problems of varying dimensionality we are able to cover a wide range of algorithmic problems while putting a lot less additional work on the designer of our amortized inference artifacts. Thanks for taking the time to read this and I hope you got curious and check out the details in the paper.

Continually Learning Deep Associative Memories

Associative memories are models that store and recall patterns. Pattern recall (associative recall) is a process whereby an associative memory, upon receiving a potentially corrupted memory query, retrieves the associated value from memory. Pattern storage is a process whereby an associative memory adjusts its parameters such that the new pattern can be recalled at a later date. Jason Yoo, a UBC CS PhD student and PLAI group member, just released a paper on arXiv about a novel neural network associative memory called BayesPCN. BayesPCN, unlike standard neural networks, is able to memorize its training data in one pass by combining predictive coding and locally conjugate Bayesian updates. Notably, this means that BayesPCN does not use backpropagation to “learn” its memory parameters.

A BayesPCN model sequentially storing the training images (shown on the left) and performing associative recall on the corrupted query images (shown on the right). BayesPCN can view a datapoint just once and encode the information needed to perform associative recall into its neural network synaptic weights. Crucially, this encoding process does not significantly interfere with the previous datapoint encodings, allowing BayesPCN’s to faithfully recall past observations.

Humans and computers make extensive use of associative memories. Have you ever heard a familiar tune at a cafe and played the subsequent tunes inside your head? Or tried to remember your oldest memory from childhood? If so, you have engaged in associative recall. Similarly, a computer’s random-access memory (RAM) accepts address bits as inputs and retrieves the corresponding value bits stored at that address. However, there is a difference between the kind of associative recall carried out by your brain and your computer. RAM’s associative recall is brittle to noise – if even one of the address bits is corrupted, RAM will error or return completely different value bits. On the other hand, human associative recall is robust to noise. Even if a non-negligible percentage of neurons responsible for posing the query to your associative memory wrongfully decide not to fire, your brain, most likely, will still be able to recall the associated value of interest.

Associative memories in machine learning are designed to support noise-tolerant recall and have been used to solve a wide range of problems from sequence processing to pattern detection. In addition, their study has allowed us to build models that replicate aspects of biological intelligence and provided novel perspectives on seemingly unrelated machine learning toolsets like dot-product attention and transformers. A particularly interesting application of associative memories has been pairing them with neural controllers that learn to store and recall hidden layer activations to better perform downstream tasks. Intuitively, this allows a neural controller to better integrate information from the distant past since remembering is reduced to posing a (potentially corrupted) query to its associative memory. This is not unlike what humans do and such approaches have set state-of-the-art scores on several sequence processing tasks.

However, a key requirement of associative memories that can be used in such settings is the ability to perform rapid and incremental pattern storage. It is well known that many offline learned machine learning models suffer from catastrophic forgetting, a phenomenon where old information is significantly overwritten by new information, when they are continually learned without care. Many modern associative memories lack explicit mechanisms that mitigate catastrophic forgetting despite their impressive recall capabilities.

This was what motivated us to design BayesPCN, a novel neural network associative memory that can be stacked arbitrarily deep. BayesPCN can perform one-shot updates of its model parameters to store new observations and is the first parametric associative memory to continually learn hundreds of very high-dimensional observations (> 10,000 dimensions) while maintaining recall performance comparable to the state-of-the-art offline learned associative memories. In addition, BayesPCN supports an explicit forgetting mechanism that gradually erases old information to free its memory. Grounded in the theory of predictive coding from neuroscience, BayesPCN’s inference and learning dynamics are more biologically plausible than the commonly used backpropagation algorithm as all model computations only rely on local information.

We first demonstrate BayesPCN’s ability to one-shot update its model parameters to store new datapoints and correctly recall stored datapoints from severely corrupted memory queries. The following image shows BayesPCN’s recall result of a 12,228 dimensional Tiny ImageNet image.

BayesPCN’s recall result before and after the one-shot write of the top left image into memory. The first row contains example memory queries, the second row contains BayesPCN’s recall outputs before write, and the third row contains BayesPCN’s recall outputs immediately after write. The four left columns illustrate auto-associative recall (no pixels are fixed) and the remaining columns illustrate hetero-associative recall (some pixels are fixed).

We observe that BayesPCN can recover the original image given its variant corrupted with white noise, random pixel blackout, and structured pixel blackout of varying intensities once the image is stored into memory.

We now demonstrate BayesPCN’s ability to continue faithfully recalling datapoints as additional datapoints are sequentially stored into memory. The following GIF details the progression of BayesPCN’s recall result for a particular Tiny ImageNet image as more and more new Tiny ImageNet images are subsequently stored into memory.

BayesPCN’s recall result after storing the top left image into memory. The top row depicts the query images given to BayesPCN as inputs and the bottom row depicts BayesPCN’s recall output for the query image above. The top right caption, “# of subsequent writes”, denotes the number of new Tiny ImageNet images that have been sequentially stored into memory since the storing of the flower image.

We observe that BayesPCN can recover the original image given its severely corrupted versions after hundreds of additional images are incrementally written into memory. This kind of recall ability under the presence of noise that slowly deteriorates over hundreds of additional sequential memory writes has never been observed before.

We now investigate whether BayesPCN can generalize – does BayesPCN get better at reconstructing unseen images from the same data distribution the more datapoint it observes? The GIF below, which details the progression of BayesPCN’s “recall” applied to a particular Tiny ImageNet image that has never been written into memory, suggests that the answer is yes.

BayesPCN’s recall result for an image not stored in memory. The top row depicts the query images given to BayesPCN as inputs and the bottom row depicts BayesPCN’s recall output for the query image above. The top right caption, “# of written datapoints”, denotes the total count of Tiny ImageNet images that have been stored into memory.

While the reconstruction quality is not good, we can clearly see it improving as more images are written into the model. This, along with similar related work, suggests that BayesPCN may be able to do more than associative recall.

Lastly, we visualize the effect of intentional forgetting on BayesPCN’s recall. BayesPCN can intentionally and gradually forget old datapoints to free its memory in order to better store new datapoints. This forgetting mechanism, once applied an infinite number of times, reverts BayesPCN’s parameters to its original state before storing any datapoint. The following media explicitly visualizes that fact.

BayesPCN’s recall result when no datapoint is stored in the model. The top row depicts the query images given to BayesPCN as inputs and the bottom row depicts BayesPCN’s recall output for the query image above. Grey is the color that corresponds to the value of zero.

BayesPCN’s recall result after storing the top left image into memory and applying the ‘forget’ operation to the model multiple times. The top row depicts the query images given to BayesPCN as inputs and the bottom row depicts BayesPCN’s recall output for the query image above. The top right caption, “# of ‘forget’ operations”, denotes the number of ‘forget’ operations (forget strength = 0.01) applied to the model since the storing of the flower image.

The recall outputs rapidly resemble the empty memory’s output as more forget operations are applied. The auto-associative recall results of the four left columns become forgotten faster than the hetero-associative recall results of the remaining columns because the former do not have fixed pixel values that ground the recall result to the query images.

Having demonstrated some results, we briefly explain the intuition behind BayesPCN’s pattern recall and pattern storage. BayesPCN recall, given a query vector, iteratively alternates between finding the hidden layer activations that most likely generated the query vector and finding the query vector that is most likely to be generated by the aforementioned hidden layer activations. BayesPCN learning, given a data vector to store, first finds the hidden layer activations that most likely generated the data vector and updates the neural network parameters via locally conjugate Bayesian update such that the model is more likely to generate the data vector / hidden layer activations pair in the future. The key characteristic of BayesPCN that mitigates catastrophic forgetting is its maintenance of a probabilistic belief over its model parameters unlike most other associative memories.

There are a number of fascinating future research directions for BayesPCN. The first is investigating how to continually store even more datapoints into BayesPCN (thousands and more) without suffering from recall performance decline. Currently, BayesPCN’s recall abruptly deteriorates for all datapoints once “too much” data is stored into memory, where “too much” depends on model hyperparameters but typically ranges from hundreds to early thousands. While forgetting empirically alleviates this issue, further investigation into this behavior would be insightful. The second is applying BayesPCN to tasks that can benefit from robust associative recall, for example sequence processing and continual learning tasks. Lastly, extending BayesPCN to support sequence memorization would be of great interest. If you want to learn more, feel free to check out our paper.

Flexible Diffusion Modeling of Long Videos

PLAI group members Will Harvey, Saeid Naderiparizi, Vaden Masrani, Christian Weilbach (under the supervision of Dr. Frank Wood) have just released a paper on an astounding new deep generative model for video. Think OpenAI‘s GPT-3 but, instead of generating text given a prompt, their “Flexible Diffusion Model” (FDM) completes videos given a few frames of context. What is more, FDM (described in a recent arXiv paper entitled Flexible Diffusion Modeling of Long Videos) generates photorealistic, coherent long videos like this (128x128x45000).

Be patient. The generated video includes realistically “stopping” at traffic lights.

Dr. Wood says “This is simply the most impressive AI result I have personally seen in my career. Long range coherence is a challenge even for modern language models with massive parameter counts. Will, Saeid, Vaden, and Christian have taken a huge step forward by being able to stably generate coherent, photo-realistic 1hour+ long videos; 70x’s longer than their longest training video, and more than 2000x’s longer than the maximum of 20 frames they ever look at at once during training. There is something very special in the training procedure they have developed and the architecture they employ. Never have we been closer to being able to formulate AI agents that plan visually in domains with life-like complexity.”

The team experimented with training their model on videos gathered in multiple environments, from the CARLA self-driving car simulator (provisioned via Inverted AI‘s Simulate cloud platform), a Minecraft reinforcement learning environment, and the Mazes environment from the DeepMind Lab suite. The video used to train FDM was collected from the first-person point of view as agents moved around in these environments. In CARLA a car drove around a single small town (Town 01), stopping at traffic lights. Otherwise it just cruised randomly in different weather conditions and at different times of day. In MineRL, video was collected from agents that moved more or less in straight lines through different MineCraft worlds to a goal block 64 blocks away. And in the Mazes environment agents moved from random starting positions to a random goal positions in procedurally generated maze worlds with brightly coloured and textured walls and floor.

Let’s see what FDM learned in each environment.

Here’s CARLA Town01:

Top row: test videos (128x128x1000). Bottom row: generated videos (128x128x1000). Red outline: “observed frames.” Test videos consist solely of observed frames. Bottom row videos are generated conditionally given the frames highlighted in red. Pauses at stoplights occur in the training data and are realistically reflected in the generated video resulting in long “pauses.”

Here’s MineRL:

Top row: Test videos (64x64x500). Bottom three rows: three different generated video continuations (64x64x500) generated conditionally given the red-highlighted frames at the start of each video.

Here’s Mazes

Top row: Test video (64x64x300). Bottom three rows: three different generated video continuations (64x64x300) generated conditionally given the red-highlighted frames at the start of each video.

Whole books could be written about what we see, good and bad, in these example videos. In CARLA the video model sometimes jumps from one location to another distant location that looks similar. Traffic lights aren’t captured well. But the yellow lines and object constancy evidenced through building appearance around corners are encouraging. As are weather, shadows, and so forth. FDM arguably just has to memorize this small town, but the CARLA training dataset was about 11Gb of video data and FDM only has 78M parameters, so, some kind of generalization is happening. The MineRL environment makes this much more clear. Every training video in MineRL is a from a different world! This means that the visual futures FDM images reflect the MineCraft engine’s world generative model parameters as well — for instance how often hills are followed by plateaus with forest vs. valleys with villages and water. As FDM imagines MineCraft futures it is visually hallucinating entirely new worlds, obeying MineCraft blockiness and biome transition rules. The MineRL agent action space also includes block breaking. Look at the middle column which exhibits, in the video generative model, block breaking and mining! In Mazes we see nearly pixel perfect environment generation, but, when the agent clearly “returns” to the same place in the maze (to us), the world it generates there can be different. This means that semantic drift shows up visually here and indicates the necessity of some kind of complimentary memory system. The video is still coherent though, and the generative model, amazingly, does not diverge into blurriness or craziness, it just continues creating an ever changing maze-like world.

Here is one last thing this model can do. Using the CARLA Town 01 training data we trained a CNN to decode x,y map position from a single front view frame. This is possible because Town 01 is small and the view from each point in its map is reasonably distinct. We then generated a CARLA Town 01 video from FDM and fed the generated frames into this “place regressor.” The results from this are amazing.

Left: generated video. Right: inferred location in CARLA Town 01 using a regressor from single image (1 frame) to location. We now use such “paths taken” by a video generative model as a new way of evaluating semantic coherence of video generative models. Note that here again FDM “stops” at true stop-light locations in CARLA Town 01, so, be patient at times. Note also that this particular manner of generating from FDM has the highest empirical scores on many measures but notably jumps around the map more than alternatives.

We are working on improving goal-conditioned video generation for vision-based planning, integrating actions and rewards explicitly into FDM, and are conducting studies on FDM’s capabilities in more complex environments, looking particularly forward to seeing what happens when we add other agents into the CARLA environment and what happens when we include both more CARLA towns but also dash-cam video from the real world.

Research Associate in Machine Learning for Physics

The Departments of computer science at the University of British Columbia, Vancouver campus invite applications for a 12-month full time position at the rank of Research Associate. The position may be extended by mutual agreement and funding availability. Salary will be negotiated with the fellow and will be commensurate with level of experience and subject to UBC regulations for research associate salaries.

Description

The research associate will be hosted at the PLAI Group at UBC and supervised by Professor Frank Wood. The PLAI Group, led by Professor Wood, conducts research in probabilistic programming and machine learning and its area of application, with ongoing collaborations under projects funded by DARPA, Intel and other organizations. This project will be funded through a collaboration with the Lawrence Berkeley Lab (LBL), with the aim of developing new methods in inference and normalizing flows, and applying them to problems of interest in high energy physics and cosmology.

Working directly with Professor Wood, the research associate will collaborate with other researchers at LBL to develop new techniques and apply them to problem of interest. The research associate may also collaborate with students at the PLAI group, and oversee work done by undergraduate and graduate students.

Key Responsibilities

Collect, prepare and analyze research data; maintain computer database of research data; tabulate and display data for presentation in research conferences and for manuscript preparation; use graphics and statistical software to analyze and present data
Supervise and mentor other personnel in the laboratory to coordinate research efforts for increased efficiency; participate in training of fellows, residents, students and volunteer workers as needed
Search pertinent scientific literature as needed, and prepare proposals for research projects
Assist with ordering and procurement of supplies and equipment and with general maintenance of workspace
Assist with preparing reports and managing projects, and contribute to the reading group and monthly project meetings
Contribute to publications based on the research work done with the group, and present results at conferences and lab meetings

Furthermore, the research associate will be expected to engage with graduate students in the departments of Statistics and Computer Science on a regular basis.

Qualifications

A Ph.D. in computer science, statistics, Physics or a relevant discipline
Research record and peer-reviewed publications demonstrating research expertise in the areas of statistics, data science, machine learning, artificial intelligence or similar topics congruent with career stage
Experience with applying data science principles to real-world physical systems or physics-based simulators is an asset
Past research experience as a post-doctoral fellow is an asset
Experience managing others and project management is an asset

Application Process

Applications should include:

A 1-page cover letter
A 2-page statement of research interests/research proposal outlining ideas for research at UBC and how they connect with their past experiences
A detailed academic CV including list of publications
Copy of up to 3 academic publications authored by the candidate
Contact information (name, e-mail, phone number, and relationship to applicant) for 3 references

Please submit your application through e-mail to plai-manager@cs.ubc.ca with the subject line “Research Associate Application – ML for Physics”.

Application Timeline

Applications will be accepted until 25 April 2022, with an anticipated start date of 1 May 2022 (or later depending on candidate availability).

Equity and diversity are essential to academic excellence. An open and diverse community fosters the inclusion of voices that have been underrepresented or discouraged. We encourage applications from members of groups that have been marginalized on any grounds enumerated under the B.C. Human Rights Code, including sex, sexual orientation, gender identity or expression, racialization, disability, political belief, religion, marital or family status, age, and/or status as a First Nation, Metis, Inuit, or Indigenous person.

All qualified candidates are encouraged to apply; however, Canadian citizens and permanent residents will be given priority.

Postdoctoral Fellowship Opportunity – Machine Learning Methods for Physics

The Departments of computer science at the University of British Columbia, Vancouver campus invite applications for a 12-month full time position at the rank of Postdoctoral Fellow. The position may be extended by mutual agreement and funding availability. Salary will be negotiated with the fellow and will be commensurate with level of experience and subject to UBC regulations for postdoctoral stipend.

Description

The Postdoctoral fellow will be hosted at the PLAI Group at UBC and supervised by Professor Frank Wood. The PLAI Group, led by Professor Wood, conducts research in probabilistic programming and machine learning and its area of application, with ongoing collaborations under projects funded by DARPA, Intel and other organizations. This project will be funded through a collaboration with the Lawrence Berkeley Lab (LBL), with the aim of developing new methods in inference and normalizing flows, and applying them to problems of interest in high energy physics and cosmology.

Working directly with Professor Wood, the post-doctoral fellow will collaborate with other researchers at LBL to develop new techniques and apply them to problem of interest. The Postdoctoral fellow may also collaborate with students at the PLAI group, and oversee work done by undergraduate and graduate students.

Key Responsibilities

Collect, prepare and analyze research data; maintain computer database of research data; tabulate and display data for presentation in research conferences and for manuscript preparation; use graphics and statistical software to analyze and present data.
Supervise and mentor other personnel in the laboratory to coordinate research efforts for increased efficiency; participate in training of fellows, residents, students and volunteer workers as needed.
Search pertinent scientific literature as needed, and prepare proposals for research projects.
Assist with ordering and procurement of supplies and equipment and with general maintenance of workspace.
Assist with preparing reports and managing projects, and contribute to the reading group and monthly project meetings
Contribute to publications based on the research work done with the group, and present results at conferences and lab meetings

Furthermore, the postdoctoral fellow will be expected to engage with graduate students in the departments of Statistics and Computer Science on a regular basis.

Qualifications

A Ph.D. in computer science, statistics, Physics or a relevant discipline, completed within the last 5 years, or near completion with a confirmed defense date
Research record and peer-reviewed publications demonstrating research expertise in the areas of statistics, data science, machine learning, artificial intelligence or similar topics congruent with career stage
Experience with applying data science principles to real-world physical systems or physics-based simulators is an asset
Experience managing others and project management is an asset

Application Process

Applications should include:

A 1-page cover letter
A 2-page statement of research interests/research proposal outlining ideas for research at UBC and how they connect with their past experiences
A detailed academic CV including list of publications
Copy of up to 3 academic publications authored by the candidate
Contact information (name, e-mail, phone number, and relationship to applicant) for 3 references

Please submit your application through e-mail to plai-manager@cs.ubc.ca with the subject line “Postdoctoral Fellow Application – ML for Physics”.

Application Timeline

Applications will be accepted on a rolling basis, with an anticipated start date of April 2022 (or earlier depending on candidate availability).

All qualified candidates are encouraged to apply; however, Canadian citizens and permanent residents will be given priority.

PLAI Group Awarded Contribution Agreement from IDEAS to Continue Work on Inference in COVID-19 Models

Post-Doctoral Fellowship Opportunity – Machine Learning for Data-centric Manufacturing

We are looking for a full-time post-doctoral fellow to work with Professor’s Frank Wood and Trevor Campbell on the Machine Learning for Data-centric Manufacturing project, led by the UBC Data Science Institute and the Composite Research Network. The position will be for 12 months, and may be extended by mutual agreement and funding availability. Salary will be negotiated with the fellow and will be commensurate with level of experience and subject to UBC regulations for postdoctoral stipend.

About the Position

The aim of the Data-centric Manufacturing project is to merge engineering, AI, and data science to streamline manufacturing processes for composite materials. The team will develop digital factories in miniature that include digital twinning and then transfer the results to real-world scale for industry partners.

The Post-doctoral fellow will be hosted at the PLAI Group at UBC and supervised jointly by Professors Frank Wood and Trevor Campbell. The PLAI Group, led by Professor Wood, conducts research in probabilistic programming and machine learning and its area of application, with ongoing collaborations under projects funded by DARPA, Intel and other organizations. Professor Campbell’s research focuses on automated, scalable Bayesian inference algorithms, Bayesian nonparametrics, streaming data, and Bayesian theory.

Working directly with Professor’s Wood and Campbell, the post-doctoral fellow will collaborate with other researchers at DSI and CRN to study the application of computational tools in the manufacturing process of composite materials. The Postdoctoral fellow may also collaborate with students at the PLAI group and professor Campbell’s group, and oversee work done by undergraduate and graduate students.

Key Responsibilities

Collect, prepare and analyze research data; maintain computer database of research data; tabulate and display data for presentation in research conferences and for manuscript preparation; use graphics and statistical software to analyze and present data.
Supervise and mentor other personnel in the laboratory to coordinate research efforts for increased efficiency; participate in training of fellows, residents, students and volunteer workers as needed.
Search pertinent scientific literature as needed, and prepare proposals for research projects.
Assist with ordering and procurement of supplies and equipment and with general maintenance of workspace.
Assist with preparing reports and managing projects, and contribute to the CRN-DSI reading group and monthly project meetings
Contribute to publications based on the research work done with the group, and present results at conferences and lab meetings

Furthermore, the postdoctoral fellow will be expected to engage with graduate students in the departments of Statistics and Computer Science on a regular basis.

Qualifications

A Ph.D. in computer science, statistics, or a relevant discipline, completed within the last 5 years, or near completion with a confirmed defence date
Research record and peer-reviewed publications demonstrating research expertise in the areas of statistics, data science, machine learning, artificial intelligence or similar topics congruent with career stage
Experience with applying data science principles to real-world physical systems or physics-based simulators is an asset
Experience managing others and project management is an asset

Application Process

Applications should include:

A 1-page cover letter
A 2-page statement of research interests/research proposal outlining ideas for research at UBC and how they connect with past experiences
A detailed academic CV including list of publications
A copy of up to 3 academic publications authored by the candidate

For more information and to apply, please follow this link: http://www.facultycareers.ubc.ca/38020

Application Timeline

Position posted: 7 August 2020
Deadline to apply: 10 September 2020 (extended)
Invitation for interview: 15 September 2020
Anticipated start date: 1 January 2021 (or earlier depending on candidate availability)