
Excellent! Next you can
create a new website with this list, or
embed it in an existing web page by copying & pasting
any of the following snippets.
JavaScript
(easiest)
PHP
iFrame
(not recommended)
<script src="https://bibbase.org/show?bib=https%3A%2F%2Fasoto.ing.puc.cl%2FAlvaroPapers.bib&jsonp=1"></script>
<?php
$contents = file_get_contents("https://bibbase.org/show?bib=https%3A%2F%2Fasoto.ing.puc.cl%2FAlvaroPapers.bib");
print_r($contents);
?>
<iframe src="https://bibbase.org/show?bib=https%3A%2F%2Fasoto.ing.puc.cl%2FAlvaroPapers.bib"></iframe>
For more details see the documention.
This is a preview! To use this list on your own web site
or create a new web site from it,
create a free account. The file will be added
and you will be able to edit it in the File Manager.
We will show you instructions once you've created your account.
To the site owner:
Action required! Mendeley is changing its API. In order to keep using Mendeley with BibBase past April 14th, you need to:
- renew the authorization for BibBase on Mendeley, and
- update the BibBase URL in your page the same way you did when you initially set up this page.
2023
(3)
A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information.
Araujo, V.; Soto, A.; and Moens, M.
In ACL, 2023.
Paper
link
bibtex
abstract
@inproceedings{Araujo:ACL:2023, title={A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information}, author={V. Araujo and A. Soto and M.F. Moens}, booktitle = {{ACL}}, year = {2023}, abstract = {Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.}, url = {https://arxiv.org/abs/2305.07565}, }
Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.
PIVOT: Prompting for Video Continual Learning.
A. Villa, J. A.; M. Alfarra, K. A.; J. Hurtado, F. C.; and A. Soto, B. G.
In CVPR, 2023.
Paper
link
bibtex
abstract
@inproceedings{AndresCVPR, author = {A. Villa, J.L. Alcázar, M. Alfarra, K. Alhamoud, J. Hurtado, F. Caba, A. Soto, B. Ghanem}, title = {PIVOT: Prompting for Video Continual Learning}, booktitle = {{CVPR}}, year = {2023}, abstract = {Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectively learns relevant patterns for new (unseen) classes, without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27\% on the 20-task ActivityNet setup..}, url = {https://arxiv.org/abs/2212.04842}, }
Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectively learns relevant patterns for new (unseen) classes, without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup..
Learning Sentence-Level Representations with Predictive Coding.
Araujo, V.; Soto, A.; and Moens, M.
Machine Learning and Knowledge Extraction, 5: 59-77. 2023.
Paper
link
bibtex
abstract
@article{Araujo2023LearningSR, title={Learning Sentence-Level Representations with Predictive Coding}, author={V. Araujo and A. Soto and M.F. Moens}, journal={Machine Learning and Knowledge Extraction}, year={2023}, volume={5}, pages={59-77}, abstract = {Learning sentence representations is an essential and challenging topic in the deep learning and natural language processing communities. Recent methods pre-train big models on a massive text corpus, focusing mainly on learning the representation of contextualized words. As a result, these models cannot generate informative sentence embeddings since they do not explicitly exploit the structure and discourse relationships existing in contiguous sentences. Drawing inspiration from human language processing, this work explores how to improve sentence-level representations of pre-trained models by borrowing ideas from predictive coding theory. Specifically, we extend BERT-style models with bottom-up and top-down computation to predict future sentences in latent space at each intermediate layer in the networks. We conduct extensive experimentation with various benchmarks for the English and Spanish languages, designed to assess sentence- and discourse-level representations and pragmatics-focused assessments. Our results show that our approach improves sentence representations consistently for both languages. Furthermore, the experiments also indicate that our models capture discourse and pragmatics knowledge. In addition, to validate the proposed method, we carried out an ablation study and a qualitative study with which we verified that the predictive mechanism helps to improve the quality of the representations.}, url = {https://www.mdpi.com/2504-4990/5/1/5} } %***********2022***************%
Learning sentence representations is an essential and challenging topic in the deep learning and natural language processing communities. Recent methods pre-train big models on a massive text corpus, focusing mainly on learning the representation of contextualized words. As a result, these models cannot generate informative sentence embeddings since they do not explicitly exploit the structure and discourse relationships existing in contiguous sentences. Drawing inspiration from human language processing, this work explores how to improve sentence-level representations of pre-trained models by borrowing ideas from predictive coding theory. Specifically, we extend BERT-style models with bottom-up and top-down computation to predict future sentences in latent space at each intermediate layer in the networks. We conduct extensive experimentation with various benchmarks for the English and Spanish languages, designed to assess sentence- and discourse-level representations and pragmatics-focused assessments. Our results show that our approach improves sentence representations consistently for both languages. Furthermore, the experiments also indicate that our models capture discourse and pragmatics knowledge. In addition, to validate the proposed method, we carried out an ablation study and a qualitative study with which we verified that the predictive mechanism helps to improve the quality of the representations.
2022
(4)
Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions.
Ossandon, J.; Earle, B.; and Soto, A.
In ECCV, 2022.
link bibtex
link bibtex
@inproceedings{Ossandn2022BridgingTV, title={Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions}, author={J. Ossandon and B. Earle and A. Soto}, booktitle={{ECCV}}, year={2022} }
How Relevant is Selective Memory Population in Lifelong Language Learning?.
Araujo, V.; Balabin, H.; Hurtado, J.; Soto, Á.; and Moens, M.
ArXiv, abs/2210.00940. 2022.
link bibtex
link bibtex
@article{Araujo2022HowRI, title={How Relevant is Selective Memory Population in Lifelong Language Learning?}, author={Vladimir Araujo and Helena Balabin and Julio Hurtado and {\'A}lvaro Soto and Marie-Francine Moens}, journal={ArXiv}, year={2022}, volume={abs/2210.00940} }
Evaluation Benchmarks for Spanish Sentence Representations.
Araujo, V.; Carvallo, A.; Kundu, S.; Canete, J.; Mendoza, M.; Mercer, R. E.; Bravo-Marquez, F.; Moens, M.; and Soto, Á.
In International Conference on Language Resources and Evaluation, 2022.
link bibtex
link bibtex
@inproceedings{Araujo2022EvaluationBF, title={Evaluation Benchmarks for Spanish Sentence Representations}, author={Vladimir Araujo and Andr{\'e}s Carvallo and Souvik Kundu and Jos'e Canete and Marcelo Mendoza and Robert E. Mercer and Felipe Bravo-Marquez and Marie-Francine Moens and {\'A}lvaro Soto}, booktitle={International Conference on Language Resources and Evaluation}, year={2022} }
Entropy-based Stability-Plasticity for Lifelong Learning.
Araujo, V.; Hurtado, J.; Soto, Á.; and Moens, M.
CVPR Workshop,3720-3727. 2022.
link bibtex
link bibtex
@article{Araujo2022EntropybasedSF, title={Entropy-based Stability-Plasticity for Lifelong Learning}, author={Vladimir Araujo and Julio Hurtado and {\'A}lvaro Soto and Marie-Francine Moens}, journal={{CVPR} Workshop}, year={2022}, pages={3720-3727} } %***********2021***************%
2021
(6)
Optimizing Reusable Knowledge for Continual Learning via Metalearning.
Hurtado, J.; Raymond-Saez, A.; and Soto, A.
In NEURIPS, 2021.
Paper
link
bibtex
abstract
@inproceedings{HurtadoEtAl:Neurips:2021, author = {J. Hurtado and A. Raymond-Saez and A. Soto}, title = {Optimizing Reusable Knowledge for Continual Learning via Metalearning}, booktitle = {{NEURIPS}}, year = {2021}, abstract = {When learning tasks over time, artificial neural networks suffer from a problem known as Catastrophic Forgetting (CF). This happens when the weights of a network are overwritten during the training of a new task causing forgetting of old information. To address this issue, we propose MetA Reusable Knowledge or MARK, a new method that fosters weight reusability instead of overwriting when learning a new task. Specifically, MARK keeps a set of shared weights among tasks. We envision these shared weights as a common Knowledge Base (KB) that is not only used to learn new tasks, but also enriched with new knowledge as the model learns new tasks. Key components behind MARK are two-fold. On the one hand, a metalearning approach provides the key mechanism to incrementally enrich the KB with new knowledge and to foster weight reusability among tasks. On the other hand, a set of trainable masks provides the key mechanism to selectively choose from the KB relevant weights to solve each task. By using MARK, we achieve state of the art results in several popular benchmarks, surpassing the best performing methods in terms of average accuracy by over 10\% on the 20-Split-MiniImageNet dataset, while achieving almost zero forgetfulness using 55\% of the number of parameters. Furthermore, an ablation study provides evidence that, indeed, MARK is learning reusable knowledge that is selectively used by each task.}, url = {https://arxiv.org/pdf/2106.05390.pdf}, }
When learning tasks over time, artificial neural networks suffer from a problem known as Catastrophic Forgetting (CF). This happens when the weights of a network are overwritten during the training of a new task causing forgetting of old information. To address this issue, we propose MetA Reusable Knowledge or MARK, a new method that fosters weight reusability instead of overwriting when learning a new task. Specifically, MARK keeps a set of shared weights among tasks. We envision these shared weights as a common Knowledge Base (KB) that is not only used to learn new tasks, but also enriched with new knowledge as the model learns new tasks. Key components behind MARK are two-fold. On the one hand, a metalearning approach provides the key mechanism to incrementally enrich the KB with new knowledge and to foster weight reusability among tasks. On the other hand, a set of trainable masks provides the key mechanism to selectively choose from the KB relevant weights to solve each task. By using MARK, we achieve state of the art results in several popular benchmarks, surpassing the best performing methods in terms of average accuracy by over 10% on the 20-Split-MiniImageNet dataset, while achieving almost zero forgetfulness using 55% of the number of parameters. Furthermore, an ablation study provides evidence that, indeed, MARK is learning reusable knowledge that is selectively used by each task.
Inspecting the concept knowledge graph encoded by modern language models.
Aspillaga, C.; Mendoza, M.; and Soto, A.
In ACL, 2021.
Paper
link
bibtex
abstract
@inproceedings{AspillagaEtAl:ACL:2021, author = {C. Aspillaga and M. Mendoza and A. Soto}, title = {Inspecting the concept knowledge graph encoded by modern language models}, booktitle = {{ACL}}, year = {2021}, abstract = {The field of natural language understanding has experienced exponential progress in the last few years, with impressive results in several tasks. This success has motivated researchers to study the underlying knowledge encoded by these models. Despite this, attempts to understand their semantic capabilities have not been successful, often leading to non-conclusive, or contradictory conclusions among different works. Via a probing classifier, we extract the underlying knowledge graph of nine of the most influential language models of the last years, including word embeddings, text generators, and context encoders. This probe is based on concept relatedness, grounded on WordNet. Our results reveal that all the models encode this knowledge, but suffer from several inaccuracies. Furthermore, we show that the different architectures and training strategies lead to different model biases. We conduct a systematic evaluation to discover specific factors that explain why some concepts are challenging. We hope our insights will motivate the development of models that capture concepts more precisely.}, url = {https://aclanthology.org/2021.findings-acl.263.pdf}, }
The field of natural language understanding has experienced exponential progress in the last few years, with impressive results in several tasks. This success has motivated researchers to study the underlying knowledge encoded by these models. Despite this, attempts to understand their semantic capabilities have not been successful, often leading to non-conclusive, or contradictory conclusions among different works. Via a probing classifier, we extract the underlying knowledge graph of nine of the most influential language models of the last years, including word embeddings, text generators, and context encoders. This probe is based on concept relatedness, grounded on WordNet. Our results reveal that all the models encode this knowledge, but suffer from several inaccuracies. Furthermore, we show that the different architectures and training strategies lead to different model biases. We conduct a systematic evaluation to discover specific factors that explain why some concepts are challenging. We hope our insights will motivate the development of models that capture concepts more precisely.
Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations.
Araujo, V.; Villa, A.; Mendoza, M.; Moens, M.; and Soto, A.
In EMNLP, 2021.
Paper
link
bibtex
abstract
@inproceedings{VladiEtAl:ACL:2021, author = {V. Araujo and A. Villa and M. Mendoza and M. Moens and A. Soto}, title = {Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations}, booktitle = {{EMNLP}}, year = {2021}, abstract = {Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.}, url = {https://arxiv.org/pdf/2109.04602.pdf}, }
Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.
TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification.
Villa, A.; Perez-Rua, J.; Araujo, V.; Niebles, J.; Escorcia, V.; and Soto, A.
In BMVC, 2021.
Paper
link
bibtex
abstract
@inproceedings{VillaEtAl:ACL:2021, author = {A. Villa and J. Perez-Rua and V. Araujo and J.C. Niebles and V. Escorcia and A. Soto}, title = {TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification}, booktitle = {{BMVC}}, year = {2021}, abstract = {Recently, few-shot learning has received increasing interest. Existing efforts have been focused on image classification, with very few attempts dedicated to the more challenging few-shot video classification problem. These few attempts aim to effectively exploit the temporal dimension in videos for better learning in low data regimes. However, they have largely ignored a key characteristic of video which could be vital for few-shot recognition, that is, videos are often accompanied by rich text descriptions. In this paper, for the first time, we propose to leverage these human-provided textual descriptions as privileged information when training a few-shot video classification model. Specifically, we formulate a text-based task conditioner to adapt video features to the few-shot learning task. Our model follows a transductive setting where query samples and support textual descriptions can be used to update the support set class prototype to further improve the task-adaptation ability of the model. Our model obtains state-of-the-art performance on four challenging benchmarks in few-shot video action classification.}, url = {https://arxiv.org/pdf/2106.11173.pdf}, }
Recently, few-shot learning has received increasing interest. Existing efforts have been focused on image classification, with very few attempts dedicated to the more challenging few-shot video classification problem. These few attempts aim to effectively exploit the temporal dimension in videos for better learning in low data regimes. However, they have largely ignored a key characteristic of video which could be vital for few-shot recognition, that is, videos are often accompanied by rich text descriptions. In this paper, for the first time, we propose to leverage these human-provided textual descriptions as privileged information when training a few-shot video classification model. Specifically, we formulate a text-based task conditioner to adapt video features to the few-shot learning task. Our model follows a transductive setting where query samples and support textual descriptions can be used to update the support set class prototype to further improve the task-adaptation ability of the model. Our model obtains state-of-the-art performance on four challenging benchmarks in few-shot video action classification.
Overcoming Catastrophic Forgetting Using Sparse Coding and Meta Learning.
Hurtado, J.; Lobel, H.; and Soto, A.
IEEE Access, 9: 88279-88290. 2021.
Paper
link
bibtex
abstract
@article{HurtadoAccessEtAl:2021, title={Overcoming Catastrophic Forgetting Using Sparse Coding and Meta Learning}, author={J. Hurtado and Hans Lobel and Alvaro Soto}, journal={IEEE Access}, year={2021}, volume={9}, pages={88279-88290}, abstract = {Continuous learning occurs naturally in human beings. However, Deep Learning methods suffer from a problem known as Catastrophic Forgetting (CF) that consists of a model drastically decreasing its performance on previously learned tasks when it is sequentially trained on new tasks. This situation, known as task interference, occurs when a network modifies relevant weight values as it learns a new task. In this work, we propose two main strategies to face the problem of task interference in convolutional neural networks. First, we use a sparse coding technique to adaptively allocate model capacity to different tasks avoiding interference between them. Specifically, we use a strategy based on group sparse regularization to specialize groups of parameters to learn each task. Afterward, by adding binary masks, we can freeze these groups of parameters, using the rest of the network to learn new tasks. Second, we use a meta learning technique to foster knowledge transfer among tasks, encouraging weight reusability instead of overwriting. Specifically, we use an optimization strategy based on episodic training to foster learning weights that are expected to be useful to solve future tasks. Together, these two strategies help us to avoid interference by preserving compatibility with previous and future weight values. Using this approach, we achieve state-of-the-art results on popular benchmarks used to test techniques to avoid CF. In particular, we conduct an ablation study to identify the contribution of each component of the proposed method, demonstrating its ability to avoid retroactive interference with previous tasks and to promote knowledge transfer to future tasks. }, url={https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9459700} }
Continuous learning occurs naturally in human beings. However, Deep Learning methods suffer from a problem known as Catastrophic Forgetting (CF) that consists of a model drastically decreasing its performance on previously learned tasks when it is sequentially trained on new tasks. This situation, known as task interference, occurs when a network modifies relevant weight values as it learns a new task. In this work, we propose two main strategies to face the problem of task interference in convolutional neural networks. First, we use a sparse coding technique to adaptively allocate model capacity to different tasks avoiding interference between them. Specifically, we use a strategy based on group sparse regularization to specialize groups of parameters to learn each task. Afterward, by adding binary masks, we can freeze these groups of parameters, using the rest of the network to learn new tasks. Second, we use a meta learning technique to foster knowledge transfer among tasks, encouraging weight reusability instead of overwriting. Specifically, we use an optimization strategy based on episodic training to foster learning weights that are expected to be useful to solve future tasks. Together, these two strategies help us to avoid interference by preserving compatibility with previous and future weight values. Using this approach, we achieve state-of-the-art results on popular benchmarks used to test techniques to avoid CF. In particular, we conduct an ablation study to identify the contribution of each component of the proposed method, demonstrating its ability to avoid retroactive interference with previous tasks and to promote knowledge transfer to future tasks.
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference.
Eyzaguirre, C.; Rio, F. D.; Araujo, V.; and Soto, A.
ArXiv, abs/2109.11745. 2021.
Paper
link
bibtex
abstract
@article{EyzaguirreEtAl:DACTBERT:2021, title={DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference}, author={C. Eyzaguirre and F. Del Rio and V. Araujo and A. Soto}, journal={ArXiv}, year={2021}, volume={abs/2109.11745}, abstract = {Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need to develop new or complementary strategies to increase the efficiency of these models. In this paper we propose DACTBERT, a differentiable adaptive computation time strategy for BERT-like models. DACTBERT adds an adaptive computational mechanism to BERT’s regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. By doing this, the model learns to combine the most appropriate intermediate representations for the task at hand. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.}, url={https://arxiv.org/pdf/2109.11745.pdf} } %***********2020***************%
Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need to develop new or complementary strategies to increase the efficiency of these models. In this paper we propose DACTBERT, a differentiable adaptive computation time strategy for BERT-like models. DACTBERT adds an adaptive computational mechanism to BERT’s regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. By doing this, the model learns to combine the most appropriate intermediate representations for the task at hand. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.
2020
(8)
Explaining VQA predictions using visual grounding and a knowledge base.
Riquelme, F.; DeGoyenechea, A.; Zhang, Y.; Niebles, J.; and Soto, A.
Image and Vision Computing, 101(9): 1-12. 2020.
Paper
link
bibtex
abstract
@article{RiquelmeEtAl:2020, Author = {F. Riquelme and A. De{G}oyenechea and Y. Zhang and JC. Niebles and A. Soto}, Title = {Explaining VQA predictions using visual grounding and a knowledge base}, Journal = {Image and Vision Computing}, Volume = {101}, Number = {9}, pages={1-12}, Year = {2020}, abstract = {In this work, we focus on the Visual Question Answering (VQA) task, where a model must answer a question based on an image, and the VQA-Explanations task, where an explanation is produced to support the answer. We introduce an interpretable model capable of pointing out and consuming information from a novel Knowledge Base (KB) composed of real-world relationships between objects, along with labels mined from available region descriptions and object annotations. Furthermore, this model provides a visual and textual explanations to complement the KB visualization. The use of a KB brings two important consequences: enhance predictions and improve interpretability. We achieve this by introducing a mechanism that can extract relevant information from this KB, and can point out the relations better suited for predicting the answer. A supervised attention map is generated over the KB to select the relevant relationships from it for each question-image pair. Moreover, we add image attention supervision on the explanations module to generate better visual and textual explanations. We quantitatively show that the predicted answers improve when using the KB; similarly, explanations improve with this and when adding image attention supervision. Also, we qualitatively show that the KB attention helps to improve interpretability and enhance explanations. Overall, the results support the benefits of having multiple tasks to enhance the interpretability and performance of the model.}, url={https://www.sciencedirect.com/science/article/abs/pii/S0262885620301001} }
In this work, we focus on the Visual Question Answering (VQA) task, where a model must answer a question based on an image, and the VQA-Explanations task, where an explanation is produced to support the answer. We introduce an interpretable model capable of pointing out and consuming information from a novel Knowledge Base (KB) composed of real-world relationships between objects, along with labels mined from available region descriptions and object annotations. Furthermore, this model provides a visual and textual explanations to complement the KB visualization. The use of a KB brings two important consequences: enhance predictions and improve interpretability. We achieve this by introducing a mechanism that can extract relevant information from this KB, and can point out the relations better suited for predicting the answer. A supervised attention map is generated over the KB to select the relevant relationships from it for each question-image pair. Moreover, we add image attention supervision on the explanations module to generate better visual and textual explanations. We quantitatively show that the predicted answers improve when using the KB; similarly, explanations improve with this and when adding image attention supervision. Also, we qualitatively show that the KB attention helps to improve interpretability and enhance explanations. Overall, the results support the benefits of having multiple tasks to enhance the interpretability and performance of the model.
Differentiable Adaptive Computation Time for Visual Reasoning.
Eyzaguirre, C.; and Soto, A.
In CVPR, 2020.
Paper
link
bibtex
abstract
@inproceedings{CristobalCVPR, author = {C. Eyzaguirre and A. Soto}, title = {Differentiable Adaptive Computation Time for Visual Reasoning}, booktitle = {{CVPR}}, year = {2020}, abstract = {This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT, which, unlike existing ones, is end-to-end differentiable. Our method can be used in conjunction with many networks; in particular, we study its application to the widely know MAC architecture, obtaining a significant reduction in the number of recurrent steps needed to achieve similar accu- racies, therefore improving its performance to computation ratio. Furthermore, we show that by increasing the max- imum number of steps used, we surpass the accuracy of even our best non-adaptive MAC, demonstrating that our approach is able to control the number of steps without im- pacting performance. Additional advantages provided by our approach include significantly improving interpretabil- ity by discarding useless steps and providing more insights into the underlying reasoning process. Finally, we present adaptive computation as an equivalent to an ensemble of models, similar to a mixture of expert formulation. Both the code and the configuration files for our experiments are made available to support further research in this area.}, url = {https://arxiv.org/abs/2004.12770}, }
This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT, which, unlike existing ones, is end-to-end differentiable. Our method can be used in conjunction with many networks; in particular, we study its application to the widely know MAC architecture, obtaining a significant reduction in the number of recurrent steps needed to achieve similar accu- racies, therefore improving its performance to computation ratio. Furthermore, we show that by increasing the max- imum number of steps used, we surpass the accuracy of even our best non-adaptive MAC, demonstrating that our approach is able to control the number of steps without im- pacting performance. Additional advantages provided by our approach include significantly improving interpretabil- ity by discarding useless steps and providing more insights into the underlying reasoning process. Finally, we present adaptive computation as an equivalent to an ensemble of models, similar to a mixture of expert formulation. Both the code and the configuration files for our experiments are made available to support further research in this area.
GENE: Graph generation conditioned on named entities for polarity and controversy detection in social media.
Mendoza, M.; Parra, D.; and Soto, A.
Information Processing and Management, 57(6). 2020.
Paper
link
bibtex
abstract
@article{MendozaEtAl:2020, Author = {M. Mendoza and D. Parra and A. Soto}, Title = {GENE: Graph generation conditioned on named entities for polarity and controversy detection in social media}, Journal = {Information Processing and Management}, Volume = {57}, Number = {6}, Year = {2020}, abstract = {Many of the interactions between users on social networks are controversial, specially in polarized environments. In effect, rather than producing a space for deliberation, these environments foster the emergence of users that disqualify the position of others. On news sites, comments on the news are characterized by such interactions. This is detrimental to the construction of a deliberative and democratic climate, stressing the need for automatic tools that can provide an early detection of polarization and controversy. We introduce GENE (graph generation conditioned on named entities), a representation of user networks conditioned on the named entities (personalities, brands, organizations) which users comment upon. GENE models the leaning that each user has concerning entities mentioned in the news. GENE graphs is able to segment the user network according to their polarity. Using the segmented network, we study the performance of two controversy indices, the existing Random Walks Controversy (RWC) and another one we introduce, Relative Closeness Controversy (RCC). These indices measure the interaction between the network’s poles providing a metric to quantify the emergence of controversy. To evaluate the performance of GENE, we model the network of users of a popular news site in Chile, collecting data in an observation window of more than three years. A large-scale evaluation using GENE, on thousands of news, allows us to conclude that over 60 percent of user comments have a predictable polarity. This predictability of the user interaction scenario allows both controversy indices to detect a controversy successfully. In particular, our introduced RCC index shows satisfactory performance in the early detection of controversies using partial information collected during the first hours of the news event, with a sensitivity to the target class exceeding 90 percent.}, url={https://www.sciencedirect.com/science/article/abs/pii/S030645732030861X} }
Many of the interactions between users on social networks are controversial, specially in polarized environments. In effect, rather than producing a space for deliberation, these environments foster the emergence of users that disqualify the position of others. On news sites, comments on the news are characterized by such interactions. This is detrimental to the construction of a deliberative and democratic climate, stressing the need for automatic tools that can provide an early detection of polarization and controversy. We introduce GENE (graph generation conditioned on named entities), a representation of user networks conditioned on the named entities (personalities, brands, organizations) which users comment upon. GENE models the leaning that each user has concerning entities mentioned in the news. GENE graphs is able to segment the user network according to their polarity. Using the segmented network, we study the performance of two controversy indices, the existing Random Walks Controversy (RWC) and another one we introduce, Relative Closeness Controversy (RCC). These indices measure the interaction between the network’s poles providing a metric to quantify the emergence of controversy. To evaluate the performance of GENE, we model the network of users of a popular news site in Chile, collecting data in an observation window of more than three years. A large-scale evaluation using GENE, on thousands of news, allows us to conclude that over 60 percent of user comments have a predictable polarity. This predictability of the user interaction scenario allows both controversy indices to detect a controversy successfully. In particular, our introduced RCC index shows satisfactory performance in the early detection of controversies using partial information collected during the first hours of the news event, with a sensitivity to the target class exceeding 90 percent.
CompactNets: Compact Hierarchical Compositional Networks for Visual Recognition.
Lobel, H.; Vidal, R.; and Soto, A.
Computer Vision and Image Understanding , 191(2). 2020.
Paper
link
bibtex
abstract
@article{Lobel:EtAl:2020, Author = {H. Lobel and R. Vidal and A. Soto}, Title = {CompactNets: Compact Hierarchical Compositional Networks for Visual Recognition}, Journal = {Computer Vision and Image Understanding }, Volume = {191}, Number = {2}, Year = {2020}, abstract = {CNN-based models currently provide state-of-the-art performance in image categorization tasks. While these methods are powerful in terms of representational capacity, they are generally not conceived with explicit means to control complexity. This might lead to scenarios where resources are used in a non-optimal manner, increasing the number of unspecialized or repeated neurons, and overfitting to data. In this work we propose CompactNets, a new approach to visual recognition that learns a hierarchy of shared, discriminative, specialized, and compact representations. CompactNets naturally capture the notion of compositional compactness, a characterization of complexity in compositional models, consisting on using the smallest number of patterns to build a suitable visual representation. We employ a structural regularizer with group-sparse terms in the objective function, that induces on each layer, an efficient and effective use of elements from the layer below. In particular, this allows groups of top-level features to be specialized based on category information. We evaluate CompactNets on the ILSVRC12 dataset, obtaining compact representations and competitive performance, using an order of magnitude less parameters than common CNN-based approaches. We show that CompactNets are able to outperform other group-sparse-based approaches, in terms of performance and compactness. Finally, transfer-learning experiments on small-scale datasets demonstrate high generalization power, providing remarkable categorization performance with respect to alternative approaches.}, url = {http://www.vision.jhu.edu/assets/CompactNets_CVIU19.pdf} }
CNN-based models currently provide state-of-the-art performance in image categorization tasks. While these methods are powerful in terms of representational capacity, they are generally not conceived with explicit means to control complexity. This might lead to scenarios where resources are used in a non-optimal manner, increasing the number of unspecialized or repeated neurons, and overfitting to data. In this work we propose CompactNets, a new approach to visual recognition that learns a hierarchy of shared, discriminative, specialized, and compact representations. CompactNets naturally capture the notion of compositional compactness, a characterization of complexity in compositional models, consisting on using the smallest number of patterns to build a suitable visual representation. We employ a structural regularizer with group-sparse terms in the objective function, that induces on each layer, an efficient and effective use of elements from the layer below. In particular, this allows groups of top-level features to be specialized based on category information. We evaluate CompactNets on the ILSVRC12 dataset, obtaining compact representations and competitive performance, using an order of magnitude less parameters than common CNN-based approaches. We show that CompactNets are able to outperform other group-sparse-based approaches, in terms of performance and compactness. Finally, transfer-learning experiments on small-scale datasets demonstrate high generalization power, providing remarkable categorization performance with respect to alternative approaches.
A Survey on Deep Learning and Explainability for Automatic Image-based Medical Report Generation.
Messina, P.; Pino, P.; Parra, D.; Soto, A.; Besa, C.; Uribe, S.; Andía, M.; Tejos, C.; Prieto, C.; and Capurro, D.
arxiv. 2020.
Paper
link
bibtex
abstract
@article{CarvalloEtAl:2020, Author = {P. Messina and P. Pino and D. Parra and A. Soto and C. Besa and S. Uribe and M. Andía and C. Tejos and C. Prieto and D. Capurro}, Title = {A Survey on Deep Learning and Explainability for Automatic Image-based Medical Report Generation}, Journal = {arxiv}, Year = {2020}, abstract = {Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.}, url = {https://arxiv.org/abs/2010.10563} }
Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.
Automatic document screening of medical literature using word and text embeddings in an active learning setting.
Carvallo, A.; Parra, D.; Lobel, H.; and Soto, A.
Scientometrics. 2020.
Paper
link
bibtex
abstract
@article{CarvalloEtAl:2020, Author = {A. Carvallo and D. Parra and H. Lobel and A. Soto}, Title = {Automatic document screening of medical literature using word and text embeddings in an active learning setting}, Journal = {Scientometrics}, Year = {2020}, abstract = {Document screening is a fundamental task within Evidence-based Medicine (EBM), a practice that provides scientific evidence to support medical decisions. Several approaches have tried to reduce physicians’ workload of screening and labeling vast amounts of documents to answer clinical questions. Previous works tried to semi-automate document screening, reporting promising results, but their evaluation was conducted on small datasets, which hinders generalization. Moreover, recent works in natural language processing have introduced neural language models, but none have compared their performance in EBM. In this paper, we evaluate the impact of several document representations such as TF-IDF along with neural language models (BioBERT, BERT, Word2Vec, and GloVe) on an active learning-based setting for document screening in EBM. Our goal is to reduce the number of documents that physicians need to label to answer clinical questions. We evaluate these methods using both a small challenging dataset (CLEF eHealth 2017) as well as a larger one but easier to rank (Epistemonikos). Our results indicate that word as well as textual neural embeddings always outperform the traditional TF-IDF representation. When comparing among neural and textual embeddings, in the CLEF eHealth dataset the models BERT and BioBERT yielded the best results. On the larger dataset, Epistemonikos, Word2Vec and BERT were the most competitive, showing that BERT was the most consistent model across different corpuses. In terms of active learning, an uncertainty sampling strategy combined with a logistic regression achieved the best performance overall, above other methods under evaluation, and in fewer iterations. Finally, we compared the results of evaluating our best models, trained using active learning, with other authors methods from CLEF eHealth, showing better results in terms of work saved for physicians in the document-screening task.}, url = {https://link.springer.com/article/10.1007/s11192-020-03648-6} }
Document screening is a fundamental task within Evidence-based Medicine (EBM), a practice that provides scientific evidence to support medical decisions. Several approaches have tried to reduce physicians’ workload of screening and labeling vast amounts of documents to answer clinical questions. Previous works tried to semi-automate document screening, reporting promising results, but their evaluation was conducted on small datasets, which hinders generalization. Moreover, recent works in natural language processing have introduced neural language models, but none have compared their performance in EBM. In this paper, we evaluate the impact of several document representations such as TF-IDF along with neural language models (BioBERT, BERT, Word2Vec, and GloVe) on an active learning-based setting for document screening in EBM. Our goal is to reduce the number of documents that physicians need to label to answer clinical questions. We evaluate these methods using both a small challenging dataset (CLEF eHealth 2017) as well as a larger one but easier to rank (Epistemonikos). Our results indicate that word as well as textual neural embeddings always outperform the traditional TF-IDF representation. When comparing among neural and textual embeddings, in the CLEF eHealth dataset the models BERT and BioBERT yielded the best results. On the larger dataset, Epistemonikos, Word2Vec and BERT were the most competitive, showing that BERT was the most consistent model across different corpuses. In terms of active learning, an uncertainty sampling strategy combined with a logistic regression achieved the best performance overall, above other methods under evaluation, and in fewer iterations. Finally, we compared the results of evaluating our best models, trained using active learning, with other authors methods from CLEF eHealth, showing better results in terms of work saved for physicians in the document-screening task.
Translating Natural Language Instructions for Behavioral Robot Navigation with a Multi-Head Attention Mechanism.
Cerda-Mardini, P.; Araujo, V.; and Soto, A.
In ACL WiNLP workshop, 2020.
Paper
link
bibtex
abstract
@inproceedings{Pato:Vladi:2020, author = {P. Cerda-Mardini and V. Araujo and A. Soto}, title = {Translating Natural Language Instructions for Behavioral Robot Navigation with a Multi-Head Attention Mechanism}, booktitle = {{ACL} WiNLP workshop}, year = {2020}, abstract = {We propose a multi-head attention mechanism as a blending layer in a neural network model that translates natural language to a high level behavioral language for indoor robot navigation. We follow the framework established by (Zang et al., 2018a) that proposes the use of a navigation graph as a knowledge base for the task. Our results show significant performance gains when translating instructions on previously unseen environments, therefore, improving the generalization capabilities of the model.}, url = {https://arxiv.org/abs/2006.00697} }
We propose a multi-head attention mechanism as a blending layer in a neural network model that translates natural language to a high level behavioral language for indoor robot navigation. We follow the framework established by (Zang et al., 2018a) that proposes the use of a navigation graph as a knowledge base for the task. Our results show significant performance gains when translating instructions on previously unseen environments, therefore, improving the generalization capabilities of the model.
Autonomous Robotic System For Automatically Monitoring the State of Shelves in Shops.
Soto, A.
WO/2019/126888. 2020.
Paper
link
bibtex
abstract
@article{ASoto:ZippediPatente:2020, Author = {A. Soto}, Title = {Autonomous Robotic System For Automatically Monitoring the State of Shelves in Shops}, Journal = {WO/2019/126888}, Year = {2020}, abstract = {The invention relates to an autonomous robotic system, for automatically monitoring the state of shelves in shops, which comprises: a mobile robot made up of a body which comprises: a mobile base that comprises a drive system connected to movement and steering means; an upper structure provided for housing sensors, at least one processing unit and communications means, in which said sensors comprise: at least one laser sensor; at least one distance, depth or proximity sensor; and at least one image sensor; a navigation system in communication with at least one laser sensor, with at least one image sensor, with at least one distance or proximity sensor and with at least one processor; a recognition system in communication with with at least one image sensor, with at least one distance or proximity sensor, with at least one processing unit and with the communication means; and a multi-objective planning system in communication with at least one processing unit and with the navigation system.}, url = {https://patentscope.wipo.int/search/es/detail.jsf?docId=WO2019126888&tab=PCTBIBLIO} } %***********2019***************%
The invention relates to an autonomous robotic system, for automatically monitoring the state of shelves in shops, which comprises: a mobile robot made up of a body which comprises: a mobile base that comprises a drive system connected to movement and steering means; an upper structure provided for housing sensors, at least one processing unit and communications means, in which said sensors comprise: at least one laser sensor; at least one distance, depth or proximity sensor; and at least one image sensor; a navigation system in communication with at least one laser sensor, with at least one image sensor, with at least one distance or proximity sensor and with at least one processor; a recognition system in communication with with at least one image sensor, with at least one distance or proximity sensor, with at least one processing unit and with the communication means; and a multi-objective planning system in communication with at least one processing unit and with the navigation system.
2019
(4)
A Behavioral Approach to Visual Navigation with Graph Localization Networks.
Chen, K.; Vicente, J. D.; Sepulveda, G.; Xia, F.; Soto, A.; Vazquez, M.; and Savarese, S.
In RSS, 2019.
Paper
link
bibtex
abstract
@inproceedings{Kevin:EtAl:2018, Author = {K. Chen and J.P De Vicente and G. Sepulveda and F. Xia and A. Soto and M. Vazquez and S. Savarese}, Title = {A Behavioral Approach to Visual Navigation with Graph Localization Networks}, booktitle = {RSS}, year = {2019}, abstract = {Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual observations and the topological map of the environment. To this end, we propose using graph neural networks for localizing the agent in the map, and decompose the action space into primitive behaviors implemented as convolutional or recurrent neural networks. Using the Gibson simulator and the Stanford 2D-3D-S dataset, we verify that our approach outperforms relevant baselines and is able to navigate in both seen and unseen indoor environments.}, url={https://graphnav.stanford.edu/}, }
Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual observations and the topological map of the environment. To this end, we propose using graph neural networks for localizing the agent in the map, and decompose the action space into primitive behaviors implemented as convolutional or recurrent neural networks. Using the Gibson simulator and the Stanford 2D-3D-S dataset, we verify that our approach outperforms relevant baselines and is able to navigate in both seen and unseen indoor environments.
Interpretable Visual Question Answering by Visual Grounding from Automatic Attention Annotations.
Zhang, B.; Niebles, J.; and Soto, A.
In WACV, 2019.
Paper
link
bibtex
abstract
@inproceedings{Ben:EtAl:2019, Author = {B. Zhang and JC. Niebles and A. Soto}, Title = {Interpretable Visual Question Answering by Visual Grounding from Automatic Attention Annotations}, booktitle = {WACV}, year = {2019}, abstract = {A key aspect of VQA models that are interpretable is their ability to ground their answers to relevant regions in the image. Current approaches with this capability rely on supervised learning and human annotated groundings to train attention mechanisms inside the VQA architecture. Unfortunately, obtaining human annotations specific for visual grounding is difficult and expensive. In this work, we demonstrate that we can effectively train a VQA architecture with grounding supervision that can be automatically obtained from available region descriptions and object annotations. We also show that our model trained with this mined supervision generates visual groundings that achieve higher correlation to manually-annotated groundings than alternative approaches, even in the case of state-of-the-art algorithms that are directly trained with human grounding annotations.}, url={https://arxiv.org/abs/1808.00265}, }
A key aspect of VQA models that are interpretable is their ability to ground their answers to relevant regions in the image. Current approaches with this capability rely on supervised learning and human annotated groundings to train attention mechanisms inside the VQA architecture. Unfortunately, obtaining human annotations specific for visual grounding is difficult and expensive. In this work, we demonstrate that we can effectively train a VQA architecture with grounding supervision that can be automatically obtained from available region descriptions and object annotations. We also show that our model trained with this mined supervision generates visual groundings that achieve higher correlation to manually-annotated groundings than alternative approaches, even in the case of state-of-the-art algorithms that are directly trained with human grounding annotations.
Mixture of Experts with Entropic Regularization for Data Classification.
Peralta, B.; Saavedra, A.; Caro, L.; and Soto, A.
Entropy, 21(2). 2019.
Paper
link
bibtex
abstract
@article{Peralta:EtAl:2019, Author = {B. Peralta and A. Saavedra and L. Caro and A. Soto}, Title = {Mixture of Experts with Entropic Regularization for Data Classification}, Journal = {Entropy}, Volume = {21}, Number = {2}, Year = {2019}, abstract = {Today, there is growing interest in the automatic classification of a variety of tasks, such as weather forecasting, product recommendations, intrusion detection, and people recognition.“Mixture-of-experts” is a well-known classification technique; it is a probabilistic model consisting of local expert classifiers weighted by a gate network that is typically based on softmax functions, combined with learnable complex patterns in data. In this scheme, one data point is influenced by only one expert; as a result, the training process can be misguided in real datasets for which complex data need to be explained by multiple experts. In this work, we propose a variant of the regular mixture-of-experts model. In the proposed model, the cost classification is penalized by the Shannon entropy of the gating network in order to avoid a “winner-takes-all” output for the gating network. Experiments show the advantage of our approach using several real datasets, with improvements in mean accuracy of 3–6\% in some datasets. In future work, we plan to embed feature selection into this model.}, url = {https://www.mdpi.com/1099-4300/21/2/190} }
Today, there is growing interest in the automatic classification of a variety of tasks, such as weather forecasting, product recommendations, intrusion detection, and people recognition.“Mixture-of-experts” is a well-known classification technique; it is a probabilistic model consisting of local expert classifiers weighted by a gate network that is typically based on softmax functions, combined with learnable complex patterns in data. In this scheme, one data point is influenced by only one expert; as a result, the training process can be misguided in real datasets for which complex data need to be explained by multiple experts. In this work, we propose a variant of the regular mixture-of-experts model. In the proposed model, the cost classification is penalized by the Shannon entropy of the gating network in order to avoid a “winner-takes-all” output for the gating network. Experiments show the advantage of our approach using several real datasets, with improvements in mean accuracy of 3–6% in some datasets. In future work, we plan to embed feature selection into this model.
Content-based artwork recommendation: integrating painting metadata with neural and manually-engineered visual features.
Messina, P.; Dominguez, V.; Parra, D.; Trattner, C.; and Soto, A.
User Modeling and User-Adapted Interaction, 29(2). 2019.
Paper
link
bibtex
abstract
@article{Messina:EtAl:2019, Author = {Pablo Messina and Vicente Dominguez and Denis Parra and Christoph Trattner and Alvaro Soto}, Title = {Content-based artwork recommendation: integrating painting metadata with neural and manually-engineered visual features}, Journal = {User Modeling and User-Adapted Interaction}, Volume = {29}, Number = {2}, Year = {2019}, abstract = {Recommender Systems help us deal with information overload by suggesting relevant items based on our personal preferences. Although there is a large body of research in areas such as movies or music, artwork recommendation has received comparatively little attention, despite the continuous growth of the artwork market. Most previous research has relied on ratings and metadata, and a few recent works have exploited visual features extracted with deep neural networks (DNN) to recommend digital art. In this work, we contribute to the area of content-based artwork recommendation of physical paintings by studying the impact of the aforementioned features (artwork metadata, neural visual features), as well as manually-engineered visual features, such as naturalness, brightness and contrast. We implement and evaluate our method using transactional data from UGallery.com, an online artwork store. Our results show that artwork recommendations based on a hybrid combination of artist preference, curated attributes, deep neural visual features and manually-engineered visual features produce the best performance. Moreover, we discuss the trade-off between automatically obtained DNN features and manually-engineered visual features for the purpose of explainability, as well as the impact of user profile size on predictions. Our research informs the development of next-generation content-based artwork recommenders which rely on different types of data, from text to multimedia.}, url = {https://link.springer.com/article/10.1007/s11257-018-9206-9} } %***********2018***************%
Recommender Systems help us deal with information overload by suggesting relevant items based on our personal preferences. Although there is a large body of research in areas such as movies or music, artwork recommendation has received comparatively little attention, despite the continuous growth of the artwork market. Most previous research has relied on ratings and metadata, and a few recent works have exploited visual features extracted with deep neural networks (DNN) to recommend digital art. In this work, we contribute to the area of content-based artwork recommendation of physical paintings by studying the impact of the aforementioned features (artwork metadata, neural visual features), as well as manually-engineered visual features, such as naturalness, brightness and contrast. We implement and evaluate our method using transactional data from UGallery.com, an online artwork store. Our results show that artwork recommendations based on a hybrid combination of artist preference, curated attributes, deep neural visual features and manually-engineered visual features produce the best performance. Moreover, we discuss the trade-off between automatically obtained DNN features and manually-engineered visual features for the purpose of explainability, as well as the impact of user profile size on predictions. Our research informs the development of next-generation content-based artwork recommenders which rely on different types of data, from text to multimedia.
2018
(4)
End-to-End Joint Semantic Segmentation of Actors and Actions in Video.
Ji, J.; Buch, S.; Niebles, J.; and Soto, A.
In ECCV, 2018.
Paper
link
bibtex
abstract
@inproceedings{Jingwei:EtAl:2018, Author = {J. Ji and S. Buch and JC. Niebles and A. Soto}, Title = {End-to-End Joint Semantic Segmentation of Actors and Actions in Video}, booktitle = {{ECCV}}, year = {2018}, abstract = {Traditional video understanding tasks include human action recognition and actor-object semantic segmentation. However, the joint task of providing semantic segmentation for different actor classes simultaneously with their action class remains a challenging but necessary task for many applications. In this work, we propose a new end-to-end architecture for tackling this joint task in videos. Our model effectively leverages multiple input modalities, contextual information, and joint multitask learning in the video to directly output semantic segmentations in a single unified framework. We train and benchmark our model on the large-scale Actor-Action Dataset (A2D) for joint actor-action semantic segmentation, and demonstrate state-of-the-art performance for both segmentation and detection. We also perform experiments verifying our joint approach improves performance for zero-shot understanding, indicating generalizability of our jointly learned feature space.}, url={http://svl.stanford.edu/assets/papers/ji2018eccv.pdf}, }
Traditional video understanding tasks include human action recognition and actor-object semantic segmentation. However, the joint task of providing semantic segmentation for different actor classes simultaneously with their action class remains a challenging but necessary task for many applications. In this work, we propose a new end-to-end architecture for tackling this joint task in videos. Our model effectively leverages multiple input modalities, contextual information, and joint multitask learning in the video to directly output semantic segmentations in a single unified framework. We train and benchmark our model on the large-scale Actor-Action Dataset (A2D) for joint actor-action semantic segmentation, and demonstrate state-of-the-art performance for both segmentation and detection. We also perform experiments verifying our joint approach improves performance for zero-shot understanding, indicating generalizability of our jointly learned feature space.
Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation.
Zang, X.; Pokle, A.; Chen, K.; Vazquez, M.; Niebles, J.; Soto, A.; and Savaresse, S.
In EMNLP, 2018.
Paper
link
bibtex
abstract
@inproceedings{Xiaoxue:EtAl:2018, Author = {X. Zang and A. Pokle and K. Chen and M. Vazquez and JC. Niebles and A. Soto and S. Savaresse}, Title = {Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation}, booktitle = {EMNLP}, year = {2018}, abstract = {We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. We use attention models to connect information from both the user instructions and a topological representation of the environment. We evaluate our model's performance on a new dataset containing 10,050 pairs of navigation instructions. Our model significantly outperforms baseline approaches. Furthermore, our results suggest that it is possible to leverage the environment map as a relevant knowledge base to facilitate the translation of free-form navigational instruction.}, url={https://arxiv.org/abs/1810.00663}, }
We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. We use attention models to connect information from both the user instructions and a topological representation of the environment. We evaluate our model's performance on a new dataset containing 10,050 pairs of navigation instructions. Our model significantly outperforms baseline approaches. Furthermore, our results suggest that it is possible to leverage the environment map as a relevant knowledge base to facilitate the translation of free-form navigational instruction.
Behavioral Indoor Navigation With Natural Language Directions.
Zang, X.; Vázquez, M.; Niebles, J.; Soto, A.; and Savarese, S.
In HRI, 2018.
Paper
link
bibtex
abstract
@inproceedings{Zang:EtAl:2018, Author = {X. Zang and M. Vázquez and JC. Niebles and A. Soto and S. Savarese}, Title = {Behavioral Indoor Navigation With Natural Language Directions}, booktitle = {{HRI}}, year = {2018}, abstract = {We describe a behavioral navigation approach that leverages the rich semantic structure of human environments to enable robots to navigate without an explicit geometric representation of the world. Based on this approach, we then present our efforts to allow robots to follow navigation instructions in natural language. With our proof-of-concept implementation, we were able to translate natural language navigation commands into a sequence of behaviors that could then be executed by a robot to reach a desired goal.}, url={http://www.marynel.net/static/pdfs/zang-HRI18.pdf}, }
We describe a behavioral navigation approach that leverages the rich semantic structure of human environments to enable robots to navigate without an explicit geometric representation of the world. Based on this approach, we then present our efforts to allow robots to follow navigation instructions in natural language. With our proof-of-concept implementation, we were able to translate natural language navigation commands into a sequence of behaviors that could then be executed by a robot to reach a desired goal.
A Deep Learning Based Behavioral Approach to Indoor Autonomous Navigation.
Sepulveda, G.; Niebles, J.; and Soto, A.
In ICRA, 2018.
Paper
link
bibtex
abstract
@inproceedings{Sepulveda:EtAl:2018, Author = {G. Sepulveda and JC. Niebles and A. Soto}, Title = {A Deep Learning Based Behavioral Approach to Indoor Autonomous Navigation}, booktitle = {{ICRA}}, year = {2018}, abstract = {We present a semantically rich graph representa- tion for indoor robotic navigation. Our graph representation encodes: semantic locations such as offices or corridors as nodes, and navigational behaviors such as enter office or cross a corridor as edges. In particular, our navigational behaviors operate directly from visual inputs to produce motor controls and are implemented with deep learning architectures. This enables the robot to avoid explicit computation of its precise location or the geometry of the environment, and enables navigation at a higher level of semantic abstraction. We evaluate the effectiveness of our representation by simulating navigation tasks in a large number of virtual environments. Our results show that using a simple sets of perceptual and navigational behaviors, the proposed approach can successfully guide the way of the robot as it completes navigational missions such as going to a specific office. Furthermore, our implementation shows to be effective to control the selection and switching of behaviors. }, url={https://arxiv.org/pdf/1803.04119v1.pdf}, } %***********2017***************%
We present a semantically rich graph representa- tion for indoor robotic navigation. Our graph representation encodes: semantic locations such as offices or corridors as nodes, and navigational behaviors such as enter office or cross a corridor as edges. In particular, our navigational behaviors operate directly from visual inputs to produce motor controls and are implemented with deep learning architectures. This enables the robot to avoid explicit computation of its precise location or the geometry of the environment, and enables navigation at a higher level of semantic abstraction. We evaluate the effectiveness of our representation by simulating navigation tasks in a large number of virtual environments. Our results show that using a simple sets of perceptual and navigational behaviors, the proposed approach can successfully guide the way of the robot as it completes navigational missions such as going to a specific office. Furthermore, our implementation shows to be effective to control the selection and switching of behaviors.
2017
(5)
How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval.
Toro, R.; Baier, J.; Ruz, C.; and Soto, A.
In IJCAI, 2017.
Paper
link
bibtex
abstract
@inproceedings{Toro:EtAl:2017, Author = {R. Toro and J. Baier and C. Ruz and A. Soto}, Title = {How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval}, booktitle = {{IJCAI}}, year = {2017}, abstract = {The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies---specifically, MIT's ConceptNet ontology---can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations. }, url={https://arxiv.org/abs/1705.08844}, }
The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies—specifically, MIT's ConceptNet ontology—can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.
GENIUS: web server to predict local gene networks and key genes for biological functions.
Puelma, T.; Araus, V.; Canales, J.; Vidal, E.; Cabello, J.; Soto, A.; and Gutierrez, R.
Bioinformatics, 28(17): 2256-64. 2017.
Paper
link
bibtex
abstract
@article{Puelma:EtAl:2016, Author = {T. Puelma and V. Araus and J. Canales and E. Vidal and J. Cabello and A. Soto and R. Gutierrez}, Title = {GENIUS: web server to predict local gene networks and key genes for biological functions}, Journal = {Bioinformatics}, Volume = {28}, Number = {17}, pages={2256-64}, Year = {2017}, abstract = {GENIUS is a user-friendly web server that uses a novel machine learning algorithm to infer functional gene networks focused on specific genes and experimental conditions that are relevant to biological functions of interest. These functions may have different levels of complexity, from specific biological processes to complex traits that involve several interacting processes. GENIUS also enriches the network with new genes related to the biological function of interest, with accuracies comparable to highly discriminative Support Vector Machine methods}, url = {https://doi.org/10.1093/bioinformatics/btw702} }
GENIUS is a user-friendly web server that uses a novel machine learning algorithm to infer functional gene networks focused on specific genes and experimental conditions that are relevant to biological functions of interest. These functions may have different levels of complexity, from specific biological processes to complex traits that involve several interacting processes. GENIUS also enriches the network with new genes related to the biological function of interest, with accuracies comparable to highly discriminative Support Vector Machine methods
Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos.
Lillo, I.; Niebles, J.; and Soto, A.
Image and Vision Computing, 59(March): 63-75. 2017.
Paper
link
bibtex
abstract
@article{Lillo:EtAl:2017, Author = {I. Lillo and JC. Niebles and A. Soto}, Title = {Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos}, Journal = {Image and Vision Computing}, Volume = {59}, Number = {March}, pages={63-75}, Year = {2017}, abstract = {This paper presents an approach to recognize human activities using body poses estimated from RGB-D data. We focus on recognizing complex activities composed of sequential or simultaneous atomic actions characterized by body motions of a single actor. We tackle this problem by introducing a hierarchical compositional model that operates at three levels of abstraction. At the lowest level, geometric and motion descriptors are used to learn a dictionary of body poses. At the intermediate level, sparse compositions of these body poses are used to obtain meaningful representations for atomic human actions. Finally, at the highest level, spatial and temporal compositions of these atomic actions are used to represent complex human activities. Our results show the benefits of using a hierarchical model that exploits the sharing and composition of body poses into atomic actions, and atomic actions into activities. A quantitative evaluation using two benchmark datasets illustrates the advantages of our model to perform action and activity recognition.}, url={http://www.sciencedirect.com/science/article/pii/S0262885616301949} }
This paper presents an approach to recognize human activities using body poses estimated from RGB-D data. We focus on recognizing complex activities composed of sequential or simultaneous atomic actions characterized by body motions of a single actor. We tackle this problem by introducing a hierarchical compositional model that operates at three levels of abstraction. At the lowest level, geometric and motion descriptors are used to learn a dictionary of body poses. At the intermediate level, sparse compositions of these body poses are used to obtain meaningful representations for atomic human actions. Finally, at the highest level, spatial and temporal compositions of these atomic actions are used to represent complex human activities. Our results show the benefits of using a hierarchical model that exploits the sharing and composition of body poses into atomic actions, and atomic actions into activities. A quantitative evaluation using two benchmark datasets illustrates the advantages of our model to perform action and activity recognition.
Unsupervised Local Regressive Attributes for Pedestrian Re-Identification.
Peralta, B.; Caro, L.; and Soto, A.
In CIARP, 2017.
Paper
link
bibtex
abstract
@inproceedings{Peralta:EtAl:2017, Author = {B. Peralta and L. Caro and A. Soto}, Title = {Unsupervised Local Regressive Attributes for Pedestrian Re-Identification}, booktitle = {{CIARP}}, year = {2017}, abstract = {.}, url={http://saturno.ing.puc.cl/media/papers_alvaro/}, }
.
Comparing Neural and Attractiveness-based Visual Features for Artwork Recommendation.
Dominguez, V.; Messina, P.; Parra, D.; Mery, D.; Trattner, C.; and Soto, A.
In Workshop on Deep Learning for Recommender Systems, co-located at RecSys 2017, 2017.
Paper
link
bibtex
@inproceedings{Dominguez:EtAl:2017, Author = {V. Dominguez and P. Messina and D. Parra and D. Mery and C. Trattner and A. Soto}, Title = {Comparing Neural and Attractiveness-based Visual Features for Artwork Recommendation}, Year = {2017}, url={https://arxiv.org/pdf/1706.07515.pdf}, booktitle={Workshop on Deep Learning for Recommender Systems, co-located at RecSys 2017} } %***********2016***************%
2016
(3)
A Proposal for Supervised Clustering with Dirichlet Process Using Labels.
Peralta, B.; Caro, A.; and Soto, A.
Pattern Recognition Letters, 80: 52-57. 2016.
Paper
link
bibtex
abstract
@article{Peralta:EtAl:2016, Author = {B. Peralta and A. Caro and A. Soto}, Title = {A Proposal for Supervised Clustering with Dirichlet Process Using Labels}, Journal = {Pattern Recognition Letters}, Volume = {80}, pages={52-57}, Year = {2016}, abstract = {Supervised clustering is an emerging area of machine learning, where the goal is to find class-uniform clusters. However, typical state-of-the-art algorithms use a fixed number of clusters. In this work, we propose a variation of a non-parametric Bayesian modeling for supervised clustering. Our approach consists of modeling the clusters as a mixture of Gaussians with the constraint of encouraging clusters of points with the same label. In order to estimate the number of clusters, we assume a-priori a countably infinite number of clusters using a variation of Dirichlet Process model over the prior distribution. In our experiments, we show that our technique typically outperforms the results of other clustering techniques.}, url={http://www.sciencedirect.com/science/article/pii/S0167865516300976} }
Supervised clustering is an emerging area of machine learning, where the goal is to find class-uniform clusters. However, typical state-of-the-art algorithms use a fixed number of clusters. In this work, we propose a variation of a non-parametric Bayesian modeling for supervised clustering. Our approach consists of modeling the clusters as a mixture of Gaussians with the constraint of encouraging clusters of points with the same label. In order to estimate the number of clusters, we assume a-priori a countably infinite number of clusters using a variation of Dirichlet Process model over the prior distribution. In our experiments, we show that our technique typically outperforms the results of other clustering techniques.
A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets.
Lillo, I.; Niebles, J.; and Soto, A.
In CVPR, 2016.
Paper
link
bibtex
abstract
@inproceedings{Lillo:EtAl:2016, Author = {I. Lillo and JC. Niebles and A. Soto}, Title = {A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets}, booktitle = {{CVPR}}, year = {2016}, abstract = {In this paper, we introduce a new hierarchical model for human action recognition using body joint locations. Our model can categorize complex actions in videos, and perform spatio-temporal annotations of the atomic actions that compose the complex action being performed. That is, for each atomic action, the model generates temporal action annotations by estimating its starting and ending times, as well as, spatial annotations by inferring the human body parts that are involved in executing the action. Our model includes three key novel properties: (i) it can be trained with no spatial supervision, as it can automatically discover active body parts from temporal action annotations only; (ii) it jointly learns flexible representations for motion poselets and actionlets that encode the visual variability of body parts and atomic actions; (iii) a mechanism to discard idle or non-informative body parts which increases its robustness to common pose estimation errors. We evaluate the performance of our method using multiple action recognition benchmarks. Our model consistently outperforms baselines and state-of-the-art action recognition methods.}, url={http://saturno.ing.puc.cl/media/papers_alvaro/FinalVersionActivities-CVPR-2016.pdf}, }
In this paper, we introduce a new hierarchical model for human action recognition using body joint locations. Our model can categorize complex actions in videos, and perform spatio-temporal annotations of the atomic actions that compose the complex action being performed. That is, for each atomic action, the model generates temporal action annotations by estimating its starting and ending times, as well as, spatial annotations by inferring the human body parts that are involved in executing the action. Our model includes three key novel properties: (i) it can be trained with no spatial supervision, as it can automatically discover active body parts from temporal action annotations only; (ii) it jointly learns flexible representations for motion poselets and actionlets that encode the visual variability of body parts and atomic actions; (iii) a mechanism to discard idle or non-informative body parts which increases its robustness to common pose estimation errors. We evaluate the performance of our method using multiple action recognition benchmarks. Our model consistently outperforms baselines and state-of-the-art action recognition methods.
Action Recognition in Video Using Sparse Coding and Relative Features.
Alfaro, A.; Mery, D.; and Soto, A.
In CVPR, 2016.
Paper
link
bibtex
abstract
@inproceedings{Anali:EtAl:2016, author = {A. Alfaro and D. Mery and A. Soto}, title = {Action Recognition in Video Using Sparse Coding and Relative Features}, booktitle = {{CVPR}}, year = {2016}, abstract = {This work presents an approach to category-based action recognition in video using sparse coding techniques. The proposed approach includes two main contributions: i) A new method to handle intra-class variations by decomposing each video into a reduced set of representative atomic action acts or key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational Act Descriptor, that exploits the power of comparative reasoning to capture relative similarity relations among key-sequences. In terms of the method to obtain key-sequences, we introduce a loss function that, for each video, leads to the identification of a sparse set of representative key-frames capturing both, relevant particularities arising in the input video, as well as relevant generalities arising in the complete class collection. In terms of the method to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative intra and inter-class similarities among local temporal patterns arising in the videos. The resulting ITRA descriptor demonstrates to be highly effective to discriminate among action categories. As a result, the proposed approach reaches remarkable action recognition performance on several popular benchmark datasets, outperforming alternative state-of-the-art techniques by a large margin.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/FinalVersion-Anali-CVPR-2016.pdf}, } %***********2015***************%
This work presents an approach to category-based action recognition in video using sparse coding techniques. The proposed approach includes two main contributions: i) A new method to handle intra-class variations by decomposing each video into a reduced set of representative atomic action acts or key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational Act Descriptor, that exploits the power of comparative reasoning to capture relative similarity relations among key-sequences. In terms of the method to obtain key-sequences, we introduce a loss function that, for each video, leads to the identification of a sparse set of representative key-frames capturing both, relevant particularities arising in the input video, as well as relevant generalities arising in the complete class collection. In terms of the method to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative intra and inter-class similarities among local temporal patterns arising in the videos. The resulting ITRA descriptor demonstrates to be highly effective to discriminate among action categories. As a result, the proposed approach reaches remarkable action recognition performance on several popular benchmark datasets, outperforming alternative state-of-the-art techniques by a large margin.
2015
(3)
Learning Shared, Discriminative, and Compact Representations for Visual Recognition.
Lobel, H.; Vidal, R.; and Soto, A.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11). 2015.
Paper
link
bibtex
abstract
@article{Lobel:EtAl:2015, Author = {H. Lobel and R. Vidal and A. Soto}, Title = {Learning Shared, Discriminative, and Compact Representations for Visual Recognition}, Journal = {{IEEE} Transactions on Pattern Analysis and Machine Intelligence}, Volume = {37}, Number = {11}, Year = {2015}, abstract = {Dictionary-based and part-based methods are among the most popular approaches to visual recognition. In both methods, a mid-level representation is built on top of low-level image descriptors and high-level classifiers are trained on top of the mid-level representation. While earlier methods built the mid-level representation without supervision, there is currently great interest in learning both representations jointly to make the mid-level representation more discriminative. In this work we propose a new approach to visual recognition that jointly learns a shared, discriminative, and compact mid-level representation and a compact high-level representation. By using a structured output learning framework, our approach directly handles the multiclass case at both levels of abstraction. Moreover, by using a group-sparse prior in the structured output learning framework, our approach encourages sharing of visual words and thus reduces the number of words used to represent each class. We test our proposed method on several popular benchmarks. Our results show that, by jointly learning mid- and high-level representations, and fostering the sharing of discriminative visual words among target classes, we are able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Hans-FINAL-PAMI-2015.pdf} }
Dictionary-based and part-based methods are among the most popular approaches to visual recognition. In both methods, a mid-level representation is built on top of low-level image descriptors and high-level classifiers are trained on top of the mid-level representation. While earlier methods built the mid-level representation without supervision, there is currently great interest in learning both representations jointly to make the mid-level representation more discriminative. In this work we propose a new approach to visual recognition that jointly learns a shared, discriminative, and compact mid-level representation and a compact high-level representation. By using a structured output learning framework, our approach directly handles the multiclass case at both levels of abstraction. Moreover, by using a group-sparse prior in the structured output learning framework, our approach encourages sharing of visual words and thus reduces the number of words used to represent each class. We test our proposed method on several popular benchmarks. Our results show that, by jointly learning mid- and high-level representations, and fostering the sharing of discriminative visual words among target classes, we are able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.
Visual Recognition to Access and Analyze People Density and Flow Patterns in Indoor Environments.
C. Ruz, C. P.; B. Peralta, I. L.; P. Espinace, R. G.; and B. Wendt, D. M.
In IEEE Winter Conference on Applications of Computer Vision (WACV), 2015.
Paper
link
bibtex
abstract
@inproceedings{Ruz:EtAl:2015, Author = {C. Ruz, C. Pieringer, B. Peralta, I. Lillo, P. Espinace, R. Gonzalez, B. Wendt, D. Mery, A. Soto}, Title = {Visual Recognition to Access and Analyze People Density and Flow Patterns in Indoor Environments}, booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)}, year = {2015}, abstract = {This work describes our experience developing a system to access density and flow of people in large indoor spaces using a network of RGB cameras. The proposed system is based on a set of overlapped and calibrated cameras. This facilitates the use of geometric constraints that help to reduce visual ambiguities. These constraints are combined with classifiers based on visual appearance to produce an efficient and robust method to detect and track humans. In this work, we argue that flow and density of people are low level measurements that need to be complemented with suitable analytic tools to bridge semantic gaps and become useful information for a target application. Consequently, we also propose a set of analytic tools that help a human user to effectively take advantage of the measurements provided by the system. Finally, we report results that demonstrate the relevance of the proposed ideas.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/WACV-2015-VersionFinal.pdf} }
This work describes our experience developing a system to access density and flow of people in large indoor spaces using a network of RGB cameras. The proposed system is based on a set of overlapped and calibrated cameras. This facilitates the use of geometric constraints that help to reduce visual ambiguities. These constraints are combined with classifiers based on visual appearance to produce an efficient and robust method to detect and track humans. In this work, we argue that flow and density of people are low level measurements that need to be complemented with suitable analytic tools to bridge semantic gaps and become useful information for a target application. Consequently, we also propose a set of analytic tools that help a human user to effectively take advantage of the measurements provided by the system. Finally, we report results that demonstrate the relevance of the proposed ideas.
Towards Improving Top-N Recommendation by Generalization of SLIM.
Larrain, S.; Parra, D.; and Soto, A.
In ACM RecSys, 2015.
Paper
link
bibtex
abstract
@inproceedings{gSLIM, Author = {S. Larrain and D. Parra and A. Soto}, Booktitle = {ACM RecSys}, Title = {Towards Improving Top-N Recommendation by Generalization of SLIM}, url = {http://web.ing.puc.cl/~dparra/pdfs/Improving_SLIM_Recommendation.pdf}, Year = {2015}, abstract = {Sparse Linear Methods (SLIM) are state-of-the-art recommendation approaches based on matrix factorization, which rely on a regularized l1-norm and l2-norm optimization –an alternative optimization problem to the traditional Frobenious norm. Although they have shown outstanding performance in Top-N recommendation, existent works have not yet analyzed some inherent assumptions that can have an important effect on the performance of these algorithms. In this paper, we attempt to improve the performance of SLIM by proposing a generalized formulation of the aforementioned assumptions. Instead of directly learning a sparse representation of the user-item matrix, we (i) learn the latent factors’ matrix of the users and the items via a traditional matrix factorization approach, and then (ii) reconstruct the latent user or item matrix via prototypes which are learned using sparse coding, an alternative SLIM commonly used in the image processing domain. The results show that by tuning the parameters of our generalized model we are able to outperform SLIM in several Top-N recommendation experiments conducted on two different datasets, using both nDCG and nDCG@10 as evaluation metrics. These preliminary results, although not conclusive, indicate a promising line of research to improve the performance of SLIM recommendation.} } %***********2014***************%
Sparse Linear Methods (SLIM) are state-of-the-art recommendation approaches based on matrix factorization, which rely on a regularized l1-norm and l2-norm optimization –an alternative optimization problem to the traditional Frobenious norm. Although they have shown outstanding performance in Top-N recommendation, existent works have not yet analyzed some inherent assumptions that can have an important effect on the performance of these algorithms. In this paper, we attempt to improve the performance of SLIM by proposing a generalized formulation of the aforementioned assumptions. Instead of directly learning a sparse representation of the user-item matrix, we (i) learn the latent factors’ matrix of the users and the items via a traditional matrix factorization approach, and then (ii) reconstruct the latent user or item matrix via prototypes which are learned using sparse coding, an alternative SLIM commonly used in the image processing domain. The results show that by tuning the parameters of our generalized model we are able to outperform SLIM in several Top-N recommendation experiments conducted on two different datasets, using both nDCG and nDCG@10 as evaluation metrics. These preliminary results, although not conclusive, indicate a promising line of research to improve the performance of SLIM recommendation.
2014
(4)
Multi-target Tracking with Sparse Group Features and Position Using Discrete-Continuous Optimization.
Peralta, B.; and Soto, A.
In ACCV, 2014.
Paper
link
bibtex
abstract
@inproceedings{Peralta:Soto:2014, Author = {B. Peralta and A. Soto}, Title = {Multi-target Tracking with Sparse Group Features and Position Using Discrete-Continuous Optimization}, booktitle = {{ACCV}}, year = {2014}, abstract = {Multi-target tracking of pedestrians is a challenging task due to uncertainty about targets, caused mainly by similarity between pedestrians, occlusion over a relatively long time and a cluttered background. A usual scheme for tackling multi-target tracking is to divide it into two sub-problems: data association and trajectory estimation. A reasonable approach is based on joint optimization of a discrete model for data association and a continuous model for trajectory estimation in a Markov Random Field framework. Nonetheless, usual solutions of the data association problem are based only on location information, while the visual information in the images is ignored. Visual features can be useful for associating detections with true targets more reliably, because the targets usually have discriminative features. In this work, we propose a combination of position and visual feature information in a discrete data association model. Moreover, we propose the use of group Lasso regularization in order to improve the identification of particular pedestrians, given that the discriminative regions are associated with particular visual blocks in the image. We find promising results for our approach in terms of precision and robustness when compared with a state-of-the-art method in standard datasets for multi-target pedestrian tracking.}, url={https://link.springer.com/chapter/10.1007/978-3-319-16634-6_49}, }
Multi-target tracking of pedestrians is a challenging task due to uncertainty about targets, caused mainly by similarity between pedestrians, occlusion over a relatively long time and a cluttered background. A usual scheme for tackling multi-target tracking is to divide it into two sub-problems: data association and trajectory estimation. A reasonable approach is based on joint optimization of a discrete model for data association and a continuous model for trajectory estimation in a Markov Random Field framework. Nonetheless, usual solutions of the data association problem are based only on location information, while the visual information in the images is ignored. Visual features can be useful for associating detections with true targets more reliably, because the targets usually have discriminative features. In this work, we propose a combination of position and visual feature information in a discrete data association model. Moreover, we propose the use of group Lasso regularization in order to improve the identification of particular pedestrians, given that the discriminative regions are associated with particular visual blocks in the image. We find promising results for our approach in terms of precision and robustness when compared with a state-of-the-art method in standard datasets for multi-target pedestrian tracking.
Local Feature Selection Using Gaussian Process Regression.
Pichara, K.; and Soto, A.
Intelligent Data Analysis (IDA), 18(3). 2014.
Paper
link
bibtex
abstract
@article{Pichara:EtAl:2014, Author = {K. Pichara and A. Soto}, Title = {Local Feature Selection Using Gaussian Process Regression}, Journal = {Intelligent Data Analysis (IDA)}, Volume = {18}, Number = {3}, Year = {2014}, abstract = {Most feature selection algorithms determine a global subset of features, where all data instances are projected in order to improve classification accuracy. An attractive alternative solution is to adaptively find a local subset of features for each data instance, such that, the classification of each instance is performed according to its own selective subspace. This paper presents a novel application of Gaussian Processes that improves classification performance by learning discriminative local subsets of features for each instance in a dataset. Gaussian Processes are used to build for each available feature a function that estimates the discriminative power of the feature over all the input space. Using these functions, we are able to determine a discriminative subspace for each possible instance by locally joining the features that present the highest levels of discriminative power. New instances are then classified by using a K-NN classifier that operates in the local subspaces. Experimental results show that by using local discriminative subspaces, we are able to reach higher levels of accuracy than alternative state-of-the-art feature selection approaches. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Karim-IDA-2014.pdf} }
Most feature selection algorithms determine a global subset of features, where all data instances are projected in order to improve classification accuracy. An attractive alternative solution is to adaptively find a local subset of features for each data instance, such that, the classification of each instance is performed according to its own selective subspace. This paper presents a novel application of Gaussian Processes that improves classification performance by learning discriminative local subsets of features for each instance in a dataset. Gaussian Processes are used to build for each available feature a function that estimates the discriminative power of the feature over all the input space. Using these functions, we are able to determine a discriminative subspace for each possible instance by locally joining the features that present the highest levels of discriminative power. New instances are then classified by using a K-NN classifier that operates in the local subspaces. Experimental results show that by using local discriminative subspaces, we are able to reach higher levels of accuracy than alternative state-of-the-art feature selection approaches.
Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities.
Lillo, I.; Niebles, J.; and Soto, A.
In CVPR, 2014.
Paper
link
bibtex
abstract
@inproceedings{Lillo:EtAl:2014, Author = {I. Lillo and JC. Niebles and A. Soto}, Title = {Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities}, booktitle = {{CVPR}}, year = {2014}, abstract = {This paper proposes a framework for recognizing complex human activities in videos. Our method describes human activities in a hierarchical discriminative model that operates at three semantic levels. At the lower level, body poses are encoded in a representative but discriminative pose dictionary. At the intermediate level, encoded poses span a space where simple human actions are composed. At the highest level, our model captures temporal and spatial compositions of actions into complex human activities. Our human activity classifier simultaneously models which body parts are relevant to the action of interest as well as their appearance and composition using a discriminative approach. By formulating model learning in a max-margin framework, our approach achieves powerful multi-class discrimination while providing useful annotations at the intermediate semantic level. We show how our hierarchical compositional model provides natural handling of occlusions. To evaluate the effectiveness of our proposed framework, we introduce a new dataset of composed human activities. We provide empirical evidence that our method achieves state-of-the-art activity classification performance on several benchmark datasets.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/activities-CVPR-14.pdf} }
This paper proposes a framework for recognizing complex human activities in videos. Our method describes human activities in a hierarchical discriminative model that operates at three semantic levels. At the lower level, body poses are encoded in a representative but discriminative pose dictionary. At the intermediate level, encoded poses span a space where simple human actions are composed. At the highest level, our model captures temporal and spatial compositions of actions into complex human activities. Our human activity classifier simultaneously models which body parts are relevant to the action of interest as well as their appearance and composition using a discriminative approach. By formulating model learning in a max-margin framework, our approach achieves powerful multi-class discrimination while providing useful annotations at the intermediate semantic level. We show how our hierarchical compositional model provides natural handling of occlusions. To evaluate the effectiveness of our proposed framework, we introduce a new dataset of composed human activities. We provide empirical evidence that our method achieves state-of-the-art activity classification performance on several benchmark datasets.
Embedded local feature selection within mixture of experts.
Peralta, B.; and Soto, A.
Information Sciences, 269: 176-187. 2014.
Paper
link
bibtex
abstract
@article{Peralta:Soto:2014, Author = {B. Peralta and A. Soto}, Title = {Embedded local feature selection within mixture of experts}, Journal = {Information Sciences}, Volume = {269}, pages = {176-187}, Year = {2014}, abstract = {A useful strategy to deal with complex classification scenarios is the divide and conquer approach. The mixture of experts (MoE) technique makes use of this strategy by jointly training a set of classifiers, or experts, that are specialized in different regions of the input space. A global model, or gate function, complements the experts by learning a function that weighs their relevance in different parts of the input space. Local feature selection appears as an attractive alternative to improve the specialization of experts and gate function, particularly, in the case of high dimensional data. In general, subsets of dimensions, or subspaces, are usually more appropriate to classify instances located in different regions of the input space. Accordingly, this work contributes with a regularized variant of MoE that incorporates an embedded process for local feature selection using L1-regularization. Experiments using artificial and real-world datasets provide evidence that the proposed method improves the classical MoE technique, in terms of accuracy and sparseness of the solution. Furthermore, our results indicate that the advantages of the proposed technique increase with the dimensionality of the data.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/RMoE.pdf} } %***********2013***************%
A useful strategy to deal with complex classification scenarios is the divide and conquer approach. The mixture of experts (MoE) technique makes use of this strategy by jointly training a set of classifiers, or experts, that are specialized in different regions of the input space. A global model, or gate function, complements the experts by learning a function that weighs their relevance in different parts of the input space. Local feature selection appears as an attractive alternative to improve the specialization of experts and gate function, particularly, in the case of high dimensional data. In general, subsets of dimensions, or subspaces, are usually more appropriate to classify instances located in different regions of the input space. Accordingly, this work contributes with a regularized variant of MoE that incorporates an embedded process for local feature selection using L1-regularization. Experiments using artificial and real-world datasets provide evidence that the proposed method improves the classical MoE technique, in terms of accuracy and sparseness of the solution. Furthermore, our results indicate that the advantages of the proposed technique increase with the dimensionality of the data.
2013
(6)
Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition.
Lobel, H.; Vidal, R.; and Soto, A.
In ICCV, 2013.
Paper
link
bibtex
abstract
@inproceedings{Lobel-a:EtAl:2013, Author = {H. Lobel and R. Vidal and A. Soto}, Title = {Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition}, booktitle = {{ICCV}}, year = {2013}, abstract = {Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is build on top of low level image descriptors while top levels classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid and top level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason is the complex data association problem associated to the larger size of the visual dictionary usually needed by BoVW approaches at the mid-level layer. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to binary classification problems, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach for visual recognition that, in the context of a BoVW scheme, jointly learns suitable mid and top level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmarks datasets. As our main result, we demonstrate that by coupling learning of mid and top level representations, the proposed approach fosters sharing of discriminativity words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/finalHans-ICCV-13.pdf} }
Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is build on top of low level image descriptors while top levels classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid and top level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason is the complex data association problem associated to the larger size of the visual dictionary usually needed by BoVW approaches at the mid-level layer. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to binary classification problems, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach for visual recognition that, in the context of a BoVW scheme, jointly learns suitable mid and top level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmarks datasets. As our main result, we demonstrate that by coupling learning of mid and top level representations, the proposed approach fosters sharing of discriminativity words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.
Human Action Recognition from Inter-Temporal Dictionaries of Key-Sequences.
Alfaro, A.; Mery, D.; and Soto, A.
In 6th Pacific-Rim Symposium on Image and Video Technology, PSIVT, 2013.
Paper
link
bibtex
abstract
@inproceedings{Alfaro:EtAl:2013, Author = { A. Alfaro and D. Mery and A. Soto}, Title = {Human Action Recognition from Inter-Temporal Dictionaries of Key-Sequences}, booktitle = {6th Pacific-Rim Symposium on Image and Video Technology, PSIVT}, year = {2013}, abstract = {This paper addresses the human action recognition in video by proposing a method based on three main processing steps. First, we tackle problems related to intraclass variations and differences in video lengths. We achieve this by reducing an input video to a set of key-sequences that represent atomic meaningful acts of each action class. Second, we use sparse coding techniques to learn a representation for each key-sequence. We then join these representations still preserving information about temporal relationships. We believe that this is a key step of our approach because it provides not only a suitable shared rep resentation to characterize atomic acts, but it also encodes global tem poral consistency among these acts. Accordingly, we call this represen tation inter-temporal acts descriptor. Third, we use this representation and sparse coding techniques to classify new videos. Finally, we show that, our approach outperforms several state-of-the-art methods when is tested using common benchmarks.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Anali-PSIVT-13.pdf} }
This paper addresses the human action recognition in video by proposing a method based on three main processing steps. First, we tackle problems related to intraclass variations and differences in video lengths. We achieve this by reducing an input video to a set of key-sequences that represent atomic meaningful acts of each action class. Second, we use sparse coding techniques to learn a representation for each key-sequence. We then join these representations still preserving information about temporal relationships. We believe that this is a key step of our approach because it provides not only a suitable shared rep resentation to characterize atomic acts, but it also encodes global tem poral consistency among these acts. Accordingly, we call this represen tation inter-temporal acts descriptor. Third, we use this representation and sparse coding techniques to classify new videos. Finally, we show that, our approach outperforms several state-of-the-art methods when is tested using common benchmarks.
Joint Dictionary and Classifier learning for Categorization of Images using a Max-margin Framework.
Lobel, H.; Vidal, R.; Mery, D.; and Soto., A.
In 6th Pacific-Rim Symposium on Image and Video Technology, PSIVT, 2013.
Paper
link
bibtex
abstract
@inproceedings{Lobel-b:EtAl:2013, Author = {H. Lobel and R. Vidal and D. Mery and A. Soto.}, Title = {Joint Dictionary and Classifier learning for Categorization of Images using a Max-margin Framework}, booktitle = {6th Pacific-Rim Symposium on Image and Video Technology, PSIVT}, year = {2013}, abstract = {The Bag-of-Visual-Words (BoVW) model is a popular approach for visual recognition. Used successfully in many different tasks, simplicity and good performance are the main reasons for its popularity. The central aspect of this model, the visual dictionary, is used to build mid-level representations based on low level image descriptors. Classifiers are then trained using these mid-level representations to perform categorization. While most works based on BoVW models have been focused on learning a suitable dictionary or on proposing a suitable pooling strategy, little effort has been devoted to explore and improve the coupling between the dictionary and the top-level classifiers, in order to gen- erate more discriminative models. This problem can be highly complex due to the large dictionary size usually needed by these methods. Also, most BoVW based systems usually perform multiclass categorization using a one-vs-all strat- egy, ignoring relevant correlations among classes. To tackle the previous issues, we propose a novel approach that jointly learns dictionary words and a proper top- level multiclass classifier. We use a max-margin learning framework to minimize a regularized energy formulation, allowing us to propagate labeled information to guide the commonly unsupervised dictionary learning process. As a result we produce a dictionary that is more compact and discriminative. We test our method on several popular datasets, where we demonstrate that our joint optimization strategy induces a word sharing behavior among the target classes, being able to achieve state-of-the-art performance using far less visual words than previous approaches. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Hans-PSIVT-13.pdf} }
The Bag-of-Visual-Words (BoVW) model is a popular approach for visual recognition. Used successfully in many different tasks, simplicity and good performance are the main reasons for its popularity. The central aspect of this model, the visual dictionary, is used to build mid-level representations based on low level image descriptors. Classifiers are then trained using these mid-level representations to perform categorization. While most works based on BoVW models have been focused on learning a suitable dictionary or on proposing a suitable pooling strategy, little effort has been devoted to explore and improve the coupling between the dictionary and the top-level classifiers, in order to gen- erate more discriminative models. This problem can be highly complex due to the large dictionary size usually needed by these methods. Also, most BoVW based systems usually perform multiclass categorization using a one-vs-all strat- egy, ignoring relevant correlations among classes. To tackle the previous issues, we propose a novel approach that jointly learns dictionary words and a proper top- level multiclass classifier. We use a max-margin learning framework to minimize a regularized energy formulation, allowing us to propagate labeled information to guide the commonly unsupervised dictionary learning process. As a result we produce a dictionary that is more compact and discriminative. We test our method on several popular datasets, where we demonstrate that our joint optimization strategy induces a word sharing behavior among the target classes, being able to achieve state-of-the-art performance using far less visual words than previous approaches.
Enhancing K-Means Using Class Labels.
Peralta, B.; Espinace, P.; and Soto, A.
Intelligent Data Analysis (IDA), 17(6): 1023-1039. 2013.
Paper
link
bibtex
abstract
@article{Peralta:EtAl:2013, Author = {B. Peralta and P. Espinace and A. Soto}, Title = {Enhancing K-Means Using Class Labels}, Journal = {Intelligent Data Analysis (IDA)}, Volume = {17}, Number = {6}, pages = {1023-1039}, Year = {2013}, abstract = {Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class- uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/supClustering.pdf} }
Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class- uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.
Indoor Scene Recognition by a Mobile Robot Through Adaptive Object Detection.
Espinace, P.; Kollar, T.; Roy, N.; and Soto, A.
Robotics and Autonomous Systems, 61(9). 2013.
Paper
link
bibtex
abstract
@article{Espinace:EtAl:2013, Author = {P. Espinace and T. Kollar and N. Roy and A. Soto}, Title = {Indoor Scene Recognition by a Mobile Robot Through Adaptive Object Detection}, Journal = {Robotics and Autonomous Systems}, Volume = {61}, Number = {9}, Year = {2013}, abstract = {Mobile Robotics has achieved notably progress, however, to increase the complexity of the tasks that mobile robots can perform in natural environments, we need to provide them with a greater semantic understanding of their surrounding. In particular, identifying indoor scenes, such as an office or a kitchen, is a highly valuable perceptual ability for an indoor mobile robot, and in this paper we propose a new technique to achieve this goal. As a distinguishing feature, we use common objects, such as doors or furnitures, as a key intermediate representation to recognize indoor scenes. We frame our method as a generative probabilistic hierarchical model, where we use object category classifiers to associate low-level visual features to objects, and contextual relations to associate objects to scenes. The inherent seman- tic interpretation of common objects allows us to use rich sources of online data to populate the probabilistic terms of our model. In contrast to alterna- tive computer vision based methods, we boost performance by exploiting the embedded and dynamic nature of a mobile robot. In particular, we increase detection accuracy and efficiency by using a 3D range sensor that allows us to implement a focus of attention mechanism based on geometric and struc- tural information. Furthermore, we use concepts from information theory to propose an adaptive scheme that limits computational load by selectively guiding the search for informative objects. The operation of this scheme is facilitated by the dynamic nature of a mobile robot that is constantly changing its field of view. We test our approach using real data captured by a mo- bile robot navigating in office and home environments. Our results indicate that the proposed approach outperforms several state-of-the-art techniques }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Final-RAS-2013.pdf} }
Mobile Robotics has achieved notably progress, however, to increase the complexity of the tasks that mobile robots can perform in natural environments, we need to provide them with a greater semantic understanding of their surrounding. In particular, identifying indoor scenes, such as an office or a kitchen, is a highly valuable perceptual ability for an indoor mobile robot, and in this paper we propose a new technique to achieve this goal. As a distinguishing feature, we use common objects, such as doors or furnitures, as a key intermediate representation to recognize indoor scenes. We frame our method as a generative probabilistic hierarchical model, where we use object category classifiers to associate low-level visual features to objects, and contextual relations to associate objects to scenes. The inherent seman- tic interpretation of common objects allows us to use rich sources of online data to populate the probabilistic terms of our model. In contrast to alterna- tive computer vision based methods, we boost performance by exploiting the embedded and dynamic nature of a mobile robot. In particular, we increase detection accuracy and efficiency by using a 3D range sensor that allows us to implement a focus of attention mechanism based on geometric and struc- tural information. Furthermore, we use concepts from information theory to propose an adaptive scheme that limits computational load by selectively guiding the search for informative objects. The operation of this scheme is facilitated by the dynamic nature of a mobile robot that is constantly changing its field of view. We test our approach using real data captured by a mo- bile robot navigating in office and home environments. Our results indicate that the proposed approach outperforms several state-of-the-art techniques
Automated Design of a Computer Vision System for Food Quality Evaluation.
Mery, D.; Pedreschi, F.; and Soto, A.
Food and Bioprocess Technology, 6(8): 2093-2108. 2013.
Paper
link
bibtex
abstract
@article{Mery:EtAl:2013, Author = {D. Mery and F. Pedreschi and A. Soto}, Title = {Automated Design of a Computer Vision System for Food Quality Evaluation}, Journal = {Food and Bioprocess Technology}, Volume = {6}, number = {8}, pages = {2093-2108}, Year = {2013}, abstract = {Considerable research efforts in computer classifiers for a given application avoiding the classical vision applied to food quality evaluation have been trial and error framework commonly used by human developed in the last years; however, they have been designers. The key idea of the proposed framework concentrated on using or developing tailored methods is to select—automatically—from a large set of fea- based on visual features that are able to solve a specific tures and a bank of classifiers those features and clas- task. Nevertheless, today’s computer capabilities are sifiers that achieve the highest performance. We tested giving us new ways to solve complex computer vision our framework on eight different food quality evalua- problems. In particular, a new paradigm on machine tion problems yielding a classification performance of learning techniques has emerged posing the task of 95 % or more in every case. The proposed framework recognizing visual patterns as a search problem based was implemented as a Matlab Toolbox available for on training data and a hypothesis space composed by noncommercial purposes. visual features and suitable classifiers. Furthermore, now we are able to extract, process, and test in the same time more image features and classifiers than before. Thus, we propose a general framework that designs a computer vision system automatically, i.e., it finds— without human interaction—the features and the classifiers for a given application avoiding the classical trial and error framework commonly used by human designers. The key idea of the proposed framework is to select—automatically—from a large set of fea- tures and a bank of classifiers those features and clas- sifiers that achieve the highest performance. We tested our framework on eight different food quality evalua- tion problems yielding a classification performance of 95% or more in every case. The proposed framework was implemented as a Matlab Toolbox available for noncommercial purposes. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Food-Mery-2012.pdf} } %***********2012***************%
Considerable research efforts in computer classifiers for a given application avoiding the classical vision applied to food quality evaluation have been trial and error framework commonly used by human developed in the last years; however, they have been designers. The key idea of the proposed framework concentrated on using or developing tailored methods is to select—automatically—from a large set of fea- based on visual features that are able to solve a specific tures and a bank of classifiers those features and clas- task. Nevertheless, today’s computer capabilities are sifiers that achieve the highest performance. We tested giving us new ways to solve complex computer vision our framework on eight different food quality evalua- problems. In particular, a new paradigm on machine tion problems yielding a classification performance of learning techniques has emerged posing the task of 95 % or more in every case. The proposed framework recognizing visual patterns as a search problem based was implemented as a Matlab Toolbox available for on training data and a hypothesis space composed by noncommercial purposes. visual features and suitable classifiers. Furthermore, now we are able to extract, process, and test in the same time more image features and classifiers than before. Thus, we propose a general framework that designs a computer vision system automatically, i.e., it finds— without human interaction—the features and the classifiers for a given application avoiding the classical trial and error framework commonly used by human designers. The key idea of the proposed framework is to select—automatically—from a large set of fea- tures and a bank of classifiers those features and clas- sifiers that achieve the highest performance. We tested our framework on eight different food quality evalua- tion problems yielding a classification performance of 95% or more in every case. The proposed framework was implemented as a Matlab Toolbox available for noncommercial purposes.
2012
(3)
Discriminative local subspaces in gene expression data for effective gene function prediction.
Puelma, T.; Gutierrez, R.; and Soto, A.
Bioinformatics, 28(17): 2256-64. 2012.
Paper
link
bibtex
abstract
@article{Puelma:EtAl:2012, Author = {T. Puelma and R. Gutierrez and A. Soto}, Title = {Discriminative local subspaces in gene expression data for effective gene function prediction}, Journal = {Bioinformatics}, Volume = {28}, Number = {17}, pages={2256-64}, Year = {2012}, abstract = {Motivation: Massive amounts of genome-wide gene expression data have become available, motivating the development of computatio- nal approaches that leverage this information to predict gene func- tion. Among successful approaches, supervised machine learning methods, such as Support Vector Machines, have shown superior prediction accuracy. However, these methods lack the simple biologi- cal intuition provided by coexpression networks, limiting their practical usefulness. Results: In this work we present Discriminative Local Subspaces (DLS), a novel method that combines supervised machine learning and coexpression techniques with the goal of systematically predict genes involved in specific biological processes of interest. Unlike tra- ditional coexpression networks, DLS uses the knowledge available in Gene Ontology (GO) to generate informative training sets that guide the discovery of expression signatures: expression patterns that are discriminative for genes involved in the biological process of interest. By linking genes coexpressed with these signatures, DLS is able to construct a discriminative coexpression network that links both, known and previously uncharacterized genes, for the selected bio- logical process. This paper focuses on the algorithm behind DLS and shows its predictive power using an Arabidopsis thaliana dataset and a representative set of 101 GO-terms from the Biological Process Ontology. Our results show that DLS has a superior average accuracy than both, Support Vector Machines and Coexpression Networks. Thus, DLS is able to provide the prediction accuracy of supervised learning methods, while maintaining the intuitive understanding of coexpression networks. Availability and Implementation: A MATLAB R implementation of DLS is available at http://virtualplant.bio.puc.cl/ cgi-bin/Lab/tools.cgi. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/DLS_Revised_Paper.pdf} }
Motivation: Massive amounts of genome-wide gene expression data have become available, motivating the development of computatio- nal approaches that leverage this information to predict gene func- tion. Among successful approaches, supervised machine learning methods, such as Support Vector Machines, have shown superior prediction accuracy. However, these methods lack the simple biologi- cal intuition provided by coexpression networks, limiting their practical usefulness. Results: In this work we present Discriminative Local Subspaces (DLS), a novel method that combines supervised machine learning and coexpression techniques with the goal of systematically predict genes involved in specific biological processes of interest. Unlike tra- ditional coexpression networks, DLS uses the knowledge available in Gene Ontology (GO) to generate informative training sets that guide the discovery of expression signatures: expression patterns that are discriminative for genes involved in the biological process of interest. By linking genes coexpressed with these signatures, DLS is able to construct a discriminative coexpression network that links both, known and previously uncharacterized genes, for the selected bio- logical process. This paper focuses on the algorithm behind DLS and shows its predictive power using an Arabidopsis thaliana dataset and a representative set of 101 GO-terms from the Biological Process Ontology. Our results show that DLS has a superior average accuracy than both, Support Vector Machines and Coexpression Networks. Thus, DLS is able to provide the prediction accuracy of supervised learning methods, while maintaining the intuitive understanding of coexpression networks. Availability and Implementation: A MATLAB R implementation of DLS is available at http://virtualplant.bio.puc.cl/ cgi-bin/Lab/tools.cgi.
Adaptive hierarchical contexts for object recognition with conditional mixture of trees.
Peralta, B.; Espinace, P.; and Soto, A.
In BMVC, 2012.
Paper
link
bibtex
abstract
@inproceedings{Peralta:EtAl:2012, Author = {B. Peralta and P. Espinace and A. Soto}, Title = {Adaptive hierarchical contexts for object recognition with conditional mixture of trees}, booktitle = {{BMVC}}, year = {2012}, abstract = {Robust category-level object recognition is currently a major goal for the computer vision community. Intra-class and pose variations, as well as, background clutter and partial occlusions are some of the main difficulties to achieve this goal. Contextual in- formation, in the form of object co-occurrences and spatial constraints, has been suc- cessfully applied to improve object recognition performance, however, previous work considers only fixed contextual relations that do not depend of the type of scene under inspection. In this work, we present a method that learns adaptive conditional relation- ships that depend on the type of scene being analyzed. In particular, we propose a model based on a conditional mixture of trees that is able to capture contextual relationships among objects using global information about a scene. Our experiments show that the adaptive specialization of contextual relationships improves object recognition accuracy outperforming previous state-of-the-art approaches. }, url = {FinalBMVC-12.pdf} }
Robust category-level object recognition is currently a major goal for the computer vision community. Intra-class and pose variations, as well as, background clutter and partial occlusions are some of the main difficulties to achieve this goal. Contextual in- formation, in the form of object co-occurrences and spatial constraints, has been suc- cessfully applied to improve object recognition performance, however, previous work considers only fixed contextual relations that do not depend of the type of scene under inspection. In this work, we present a method that learns adaptive conditional relation- ships that depend on the type of scene being analyzed. In particular, we propose a model based on a conditional mixture of trees that is able to capture contextual relationships among objects using global information about a scene. Our experiments show that the adaptive specialization of contextual relationships improves object recognition accuracy outperforming previous state-of-the-art approaches.
Indoor Mobile Robotics at Grima, PUC.
L. Caro, J. C.; P. Espinace, D. M.; R. Mitnik, S. M.; S. Pszszfolkowski, D. L.; A. Araneda, D. M.; and M. Torres, A. S.
Journal of Intelligent and Robotic Systems, 66(1-2): 151-165. 2012.
Paper
link
bibtex
abstract
@article{Caro:EtAl:2012, Author = {L. Caro, J. Correa, P. Espinace, D. Maturana, R. Mitnik, S. Montabone, S. Pszszfolkowski, D. Langdon, A. Araneda, D. Mery, M. Torres, A. Soto}, Title = {Indoor Mobile Robotics at Grima, PUC}, Journal = {Journal of Intelligent and Robotic Systems}, Volume = {66}, pages={151-165}, Number = {1-2}, Year = {2012}, abstract = {This paper describes the main activities and achievements of our research group on Ma- chine Intelligence and Robotics (Grima) at the Computer Science Department, Pontificia Uni- versidad Catolica de Chile (PUC). Since 2002, we have been developing an active research in the area of indoor autonomous social robots. Our main focus has been the cognitive side of Robotics, where we have developed algorithms for autonomous navigation using wheeled robots, scene recognition using vision and 3D range sen- sors, and social behaviors using Markov Deci- sion Processes, among others. As a distinguish- ing feature, in our research we have followed a probabilistic approach, deeply rooted in ma- chine learning and Bayesian statistical techniques. Among our main achievements are an increasing list of publications in main Robotics conference and journals, and the consolidation of a research group with more than 25 people among full- time professors, visiting researchers, and graduate students. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Latam-2012.pdf} } %***********2011***************%
This paper describes the main activities and achievements of our research group on Ma- chine Intelligence and Robotics (Grima) at the Computer Science Department, Pontificia Uni- versidad Catolica de Chile (PUC). Since 2002, we have been developing an active research in the area of indoor autonomous social robots. Our main focus has been the cognitive side of Robotics, where we have developed algorithms for autonomous navigation using wheeled robots, scene recognition using vision and 3D range sen- sors, and social behaviors using Markov Deci- sion Processes, among others. As a distinguish- ing feature, in our research we have followed a probabilistic approach, deeply rooted in ma- chine learning and Bayesian statistical techniques. Among our main achievements are an increasing list of publications in main Robotics conference and journals, and the consolidation of a research group with more than 25 people among full- time professors, visiting researchers, and graduate students.
2011
(4)
Learning Discriminative Local Binary Patterns for Face Recognition.
Maturana, D.; Mery, D.; and Soto, A.
In 9th IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2011.
Paper
link
bibtex
abstract
@inproceedings{Maturana:EtAl:2011, Author = {D. Maturana and D. Mery and A. Soto}, Title = {Learning Discriminative Local Binary Patterns for Face Recognition}, booktitle = {9th IEEE International Conference on Automatic Face and Gesture Recognition (FG)}, year = {2011}, abstract = {Histograms of Local Binary Patterns (LBPs) and variations thereof are a popular local visual descriptor for face recognition. So far, most variations of LBP are designed by hand or are learned with non-supervised methods. In this work we propose a simple method to learn discriminative LBPs in a supervised manner. The method represents an LBP-like descriptor as a set of pixel comparisons within a neighborhood and heuristically seeks for a set of pixel comparisons so as to maximize a Fisher separability criterion for the resulting his- tograms. Tests on standard face recognition datasets show that this method can create compact yet discriminative descriptors.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/FG-2011.pdf} }
Histograms of Local Binary Patterns (LBPs) and variations thereof are a popular local visual descriptor for face recognition. So far, most variations of LBP are designed by hand or are learned with non-supervised methods. In this work we propose a simple method to learn discriminative LBPs in a supervised manner. The method represents an LBP-like descriptor as a set of pixel comparisons within a neighborhood and heuristically seeks for a set of pixel comparisons so as to maximize a Fisher separability criterion for the resulting his- tograms. Tests on standard face recognition datasets show that this method can create compact yet discriminative descriptors.
Automated Fish Bone Detection using X-ray Testing.
Mery, D.; Lillo, I.; Loebel, H.; V. Riffo, A. S.; Cipriano, A.; and Aguilera, J.
Journal of Food Engineering, 105(3): 485-492. 2011.
Paper
link
bibtex
abstract
@article{Mery:EtAl:2011, Author = {D. Mery and I. Lillo and H. Loebel and V. Riffo, A. Soto and A. Cipriano and JM. Aguilera}, Title = {Automated Fish Bone Detection using X-ray Testing}, journal = {Journal of Food Engineering}, volume = {105}, number = {3}, pages={485-492}, year = {2011}, abstract = {In countries where fish is often consumed, fish bones are some of the most frequently ingested foreign bodies encountered in foods. In the production of fish fillets, fish bone detection is performed by human inspection using their sense of touch and vision which can lead to misclassification. Effective detection of fish bones in the quality control process would help avoid this problem. For this reason, an X-ray machine vision approach to automatically detect fish bones in fish fillets was developed. This paper describes our approach and the corresponding experiments with salmon and trout fillets. In the experiments, salmon X-ray images using 10×10 pixels detection windows and 24 intensity features (selected from 279 features) were analyzed. The methodology was validated using representative fish bones and trouts provided by a salmon industry and yielded a detection performance of 99%. We believe that the proposed approach opens new possibilities in the field of automated visual inspection of salmon, trout and other similar fish. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/2011-JFoodEng-SalmonX.pdf}, }
In countries where fish is often consumed, fish bones are some of the most frequently ingested foreign bodies encountered in foods. In the production of fish fillets, fish bone detection is performed by human inspection using their sense of touch and vision which can lead to misclassification. Effective detection of fish bones in the quality control process would help avoid this problem. For this reason, an X-ray machine vision approach to automatically detect fish bones in fish fillets was developed. This paper describes our approach and the corresponding experiments with salmon and trout fillets. In the experiments, salmon X-ray images using 10×10 pixels detection windows and 24 intensity features (selected from 279 features) were analyzed. The methodology was validated using representative fish bones and trouts provided by a salmon industry and yielded a detection performance of 99%. We believe that the proposed approach opens new possibilities in the field of automated visual inspection of salmon, trout and other similar fish.
Active Learning and Subspace Clustering for Anomaly Detection.
Pichara, K.; and Soto, A.
Intelligent Data Analysis (IDA), 15(2): 151-171. 2011.
Paper
link
bibtex
abstract
@article{Pichara:EtAl:2011, Author = {K. Pichara and A. Soto}, Title = {Active Learning and Subspace Clustering for Anomaly Detection}, journal = { Intelligent Data Analysis (IDA)}, volume = {15}, number = {2}, pages={151-171}, year = {2011}, abstract = {Today, anomaly detection is a highly valuable application in the analysis of current huge datasets. Insurance companies, banks and many manufacturing industries need systems to help humans to detect anomalies in their daily information. In general, anomalies are a very small fraction of the data, therefore their detection is not an easy task. Usually real sources of an anomaly are given by specific values expressed on selective dimensions of datasets, furthermore, many anomalies are not really interesting for humans, due to the fact that interestingness of anomalies is categorized subjectively by the human user. In this paper we propose a new semi-supervised algorithm that actively learns to detect relevant anomalies by interacting with an expert user in order to obtain semantic information about user preferences. Our approach is based on 3 main steps. First, a Bayes network identifies an initial set of candidate anomalies. Afterwards, a subspace clustering technique identifies relevant subsets of dimensions. Finally, a probabilistic active learning scheme, based on properties of Dirichlet distribution, uses the feedback from an expert user to efficiently search for relevant anomalies. Our results, using synthetic and real datasets, indicate that, under noisy data and anomalies presenting regular patterns, our approach correctly identifies relevant anomalies. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/IDA-2011.pdf} }
Today, anomaly detection is a highly valuable application in the analysis of current huge datasets. Insurance companies, banks and many manufacturing industries need systems to help humans to detect anomalies in their daily information. In general, anomalies are a very small fraction of the data, therefore their detection is not an easy task. Usually real sources of an anomaly are given by specific values expressed on selective dimensions of datasets, furthermore, many anomalies are not really interesting for humans, due to the fact that interestingness of anomalies is categorized subjectively by the human user. In this paper we propose a new semi-supervised algorithm that actively learns to detect relevant anomalies by interacting with an expert user in order to obtain semantic information about user preferences. Our approach is based on 3 main steps. First, a Bayes network identifies an initial set of candidate anomalies. Afterwards, a subspace clustering technique identifies relevant subsets of dimensions. Finally, a probabilistic active learning scheme, based on properties of Dirichlet distribution, uses the feedback from an expert user to efficiently search for relevant anomalies. Our results, using synthetic and real datasets, indicate that, under noisy data and anomalies presenting regular patterns, our approach correctly identifies relevant anomalies.
Mixing Hierarchical Contexts for Object Recognition.
Peralta, B.; and Soto, A.
In CIARP, 2011.
Paper
link
bibtex
abstract
@inproceedings{Peralta:EtAl:2011, Author = {B. Peralta and A. Soto}, Title = {Mixing Hierarchical Contexts for Object Recognition}, booktitle = {{CIARP}}, year = {2011}, abstract = {Robust category-level object recognition is currently a major goal for the Computer Vision community. Intra-class and pose variations, as well as, background clutter and partial occlusions are some of the main difficulties to achieve this goal. Contextual information in the form of ob- ject co-ocurrences and spatial contraints has been successfully applied to reduce the inherent uncertainty of the visual world. Recently, Choi et al. [5] propose the use of a tree-structured graphical model to capture contextual relations among objects. Under this model there is only one possible fixed contextual relation among subsets of objects. In this work we extent Choi et al. approach by using a mixture model to consider the case that contextual relations among objects depend on scene type. Our experiments highlight the advantages of our proposal, showing that the adaptive specialization of contextual relations improves object recogni- tion and object detection performances. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Peralta-2011.pdf} } %***********2010***************%
Robust category-level object recognition is currently a major goal for the Computer Vision community. Intra-class and pose variations, as well as, background clutter and partial occlusions are some of the main difficulties to achieve this goal. Contextual information in the form of ob- ject co-ocurrences and spatial contraints has been successfully applied to reduce the inherent uncertainty of the visual world. Recently, Choi et al. [5] propose the use of a tree-structured graphical model to capture contextual relations among objects. Under this model there is only one possible fixed contextual relation among subsets of objects. In this work we extent Choi et al. approach by using a mixture model to consider the case that contextual relations among objects depend on scene type. Our experiments highlight the advantages of our proposal, showing that the adaptive specialization of contextual relations improves object recogni- tion and object detection performances.
2010
(6)
Face Recognition with Decision Tree-based Local Binary Patterns.
Maturana, D.; Mery, D.; and Soto, A.
In Proc. of Asian Conference on Computer Vision (ACCV-2010), 2010.
Paper
link
bibtex
abstract
@inproceedings{Maturana:EtAl:2010, Author = {D. Maturana and D. Mery and A. Soto}, Title = {Face Recognition with Decision Tree-based Local Binary Patterns}, booktitle = {Proc. of Asian Conference on Computer Vision (ACCV-2010)}, year = {2010}, abstract = {Many state-of-the-art face recognition algorithms use image descriptors based on features known as Local Binary Patterns (LBPs). While many variations of LBP exist, so far none of them can automati- cally adapt to the training data. We introduce and analyze a novel gen- eralization of LBP that learns the most discriminative LBP-like features for each facial region in a supervised manner. Since the proposed method is based on Decision Trees, we call it Decision Tree Local Binary Pat- terns or DT-LBPs. Tests on standard face recognition datasets show the superiority of DT-LBP with respect of several state-of-the-art feature descriptors regularly used in face recognition applications. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/ACCV-2010.pdf} }
Many state-of-the-art face recognition algorithms use image descriptors based on features known as Local Binary Patterns (LBPs). While many variations of LBP exist, so far none of them can automati- cally adapt to the training data. We introduce and analyze a novel gen- eralization of LBP that learns the most discriminative LBP-like features for each facial region in a supervised manner. Since the proposed method is based on Decision Trees, we call it Decision Tree Local Binary Pat- terns or DT-LBPs. Tests on standard face recognition datasets show the superiority of DT-LBP with respect of several state-of-the-art feature descriptors regularly used in face recognition applications.
Automated Detection of Fish Bones in Salmon Fillets using X-ray Testing.
Mery, D.; Lillo, I.; Loebel, H.; Riffo, V.; Soto, A.; Cipriano, A.; and Aguilera, J.
In Proc. of 4th Pacific-Rim Symposium on Image and Video Technology (PSIVT-2010), 2010.
Paper
link
bibtex
abstract
@inproceedings{Mery-PSIVT:EtAl:2010, Author = {D. Mery and I. Lillo and H. Loebel and V. Riffo and A. Soto and A. Cipriano and JM. Aguilera}, Title = {Automated Detection of Fish Bones in Salmon Fillets using X-ray Testing}, booktitle = {Proc. of 4th Pacific-Rim Symposium on Image and Video Technology (PSIVT-2010)}, year = {2010}, abstract = {X-ray testing is playing an increasingly important role in food quality assurance. In the production of fish fillets, however, fish bone detection is performed by human operators using their sense of touch and vision which can lead to misclassification. In countries where fish is often consumed, fish bones are some of the most frequently ingested foreign bodies encountered in foods. Effective detection of fish bones in the quality control process would help avoid this problem. For this reason, we developed an X-ray machine vision approach to au- tomatically detect fish bones in fish fillets. This paper describes our approach and the corresponding validation experiments with salmon fillets. The approach consists of six steps: 1) A digital X-ray image is taken of the fish fillet being tested. 2) The X-ray image is filtered and enhanced to facilitate the detection of fish bones. 3) Potential fish bones in the image are segmented using band pass filtering, thresholding and morphological techniques. 4) Intensity features of the enhanced X-ray image are extracted from small detection windows that are defined in those regions where potential fish bones were segmented. 5) A classifier is used to discriminate between ‘bones’ and ‘no-bones’ classes in the detection windows. 6) Finally, fish bones in the X-ray image are isolated using morphological operations applied on the corresponding segments classified as ‘bones’. In the experiments we used a high resolution flat panel detector with the capacity to capture up to a 6 million pixel digital X-ray image. In the training phase, we analyzed 20 representative salmon fillets, 7700 detection windows (10×10 pixels) and 279 intensity features. Cross validation yielded a detection performance of 95% using a support vector machine classifier with only 24 selected features. We believe that the proposed approach opens new possibilities in the field of automated visual inspection of salmon and other similar fish. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/PSIVT-2010.pdf} }
X-ray testing is playing an increasingly important role in food quality assurance. In the production of fish fillets, however, fish bone detection is performed by human operators using their sense of touch and vision which can lead to misclassification. In countries where fish is often consumed, fish bones are some of the most frequently ingested foreign bodies encountered in foods. Effective detection of fish bones in the quality control process would help avoid this problem. For this reason, we developed an X-ray machine vision approach to au- tomatically detect fish bones in fish fillets. This paper describes our approach and the corresponding validation experiments with salmon fillets. The approach consists of six steps: 1) A digital X-ray image is taken of the fish fillet being tested. 2) The X-ray image is filtered and enhanced to facilitate the detection of fish bones. 3) Potential fish bones in the image are segmented using band pass filtering, thresholding and morphological techniques. 4) Intensity features of the enhanced X-ray image are extracted from small detection windows that are defined in those regions where potential fish bones were segmented. 5) A classifier is used to discriminate between ‘bones’ and ‘no-bones’ classes in the detection windows. 6) Finally, fish bones in the X-ray image are isolated using morphological operations applied on the corresponding segments classified as ‘bones’. In the experiments we used a high resolution flat panel detector with the capacity to capture up to a 6 million pixel digital X-ray image. In the training phase, we analyzed 20 representative salmon fillets, 7700 detection windows (10×10 pixels) and 279 intensity features. Cross validation yielded a detection performance of 95% using a support vector machine classifier with only 24 selected features. We believe that the proposed approach opens new possibilities in the field of automated visual inspection of salmon and other similar fish.
Quality Classification of Corn Tortillas using Computer Vision.
Mery, D.; Chanona-Perez, J.; Soto, A.; Aguilera, J.; Cipriano, A.; Velez-Riverab, N.; Arzate-Vazquez, I.; and Gutierrez-Lopez, G.
Journal of Food Engineering, 101(4): 357-364. 2010.
Paper
link
bibtex
abstract
@article{Mery:EtAl:2010, Author = {D. Mery and J. Chanona-Perez and A. Soto and JM. Aguilera and A. Cipriano and N. Velez-Riverab and I. Arzate-Vazquez and G. Gutierrez-Lopez}, Title = {Quality Classification of Corn Tortillas using Computer Vision}, Journal = {Journal of Food Engineering}, Volume = {101}, Number = {4}, Pages = {357-364}, Year = {2010}, abstract = {Computer vision is playing an increasingly important role in automated visual food inspection. However quality control in tortilla production is still performed by human operators which may lead to misclassification due to their subjectivity and fatigue. In order to reduce the need for human operators and therefore misclassification, we developed a computer vision framework to automatically classify the quality of corn tortillas according to five hedonic sub-classes given by a sensorial panel. The proposed framework analyzed 750 corn tortillas obtained from 15 different Mexican commercial stores which were either small, medium or large in size. More than 2300 geometric and color features were extracted from 1500 images capturing both sides of the 750 tortillas. After implementing a feature selection algorithm, in which the most relevant features were selected for the classification of the five sub-classes, only 64 features were required to design a classifier based on support vector machines. Cross validation yielded a performance of 95% in the classification of the five hedonic sub-classes. Additionally, using only 10 of the selected features and a simple statistical classifier, it was possible to determine the origin of the tortillas with a performance of 96%. We believe that the proposed framework opens up new possibilities in the field of automated visual inspection of tortillas. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Tortillas-2010.pdf} }
Computer vision is playing an increasingly important role in automated visual food inspection. However quality control in tortilla production is still performed by human operators which may lead to misclassification due to their subjectivity and fatigue. In order to reduce the need for human operators and therefore misclassification, we developed a computer vision framework to automatically classify the quality of corn tortillas according to five hedonic sub-classes given by a sensorial panel. The proposed framework analyzed 750 corn tortillas obtained from 15 different Mexican commercial stores which were either small, medium or large in size. More than 2300 geometric and color features were extracted from 1500 images capturing both sides of the 750 tortillas. After implementing a feature selection algorithm, in which the most relevant features were selected for the classification of the five sub-classes, only 64 features were required to design a classifier based on support vector machines. Cross validation yielded a performance of 95% in the classification of the five hedonic sub-classes. Additionally, using only 10 of the selected features and a simple statistical classifier, it was possible to determine the origin of the tortillas with a performance of 96%. We believe that the proposed framework opens up new possibilities in the field of automated visual inspection of tortillas.
Active visual perception for mobile robot localization.
Correa, J.; and Soto, A.
Journal of Intelligent and Robotic Systems, 58(3-4): 339-354. 2010.
Paper
link
bibtex
abstract
@article{Correa:EtAl:2010, Author = {J. Correa and A. Soto}, Title = {Active visual perception for mobile robot localization}, Journal = {Journal of Intelligent and Robotic Systems}, Volume = {58}, Number = {3-4}, Pages = {339-354}, Year = {2010}, abstract = {Localization is a key issue for a mobile robot, in particular in environments where a globally accurate positioning system, such as GPS, is not available. In these environments, accurate and efficient robot localization is not a trivial task, as an increase in accuracy usually leads to an impoverishment in efficiency and viceversa. Active perception appears as an appealing way to improve the localization process by increasing the richness of the information acquired from the environment. In this paper, we present an active perception strategy for a mobile robot provided with a visual sensor mounted on a pan-tilt mechanism. The visual sensor has a limited field of view, so the goal of the active perception strategy is to use the pan-tilt unit to direct the sensor to informative parts of the environment. To achieve this goal, we use a topological map of the environment and a Bayesian non-parametric estimation of robot position based on a particle filter. We slightly modify the regular implementation of this filter by including an additional step that selects the best perceptual action using Monte Carlo estimations. We understand the best perceptual action as the one that produces the greatest reduction in uncertainty about the robot position. We also consider in our optimization function a cost term that favors efficient perceptual actions. Previous works have proposed active perception strategies for robot localization, but mainly in the context of range sensors, grid representations of the environment, and parametric techniques, such as the extended Kalman filter. Accordingly, the main contributions of this work are: i) Development of a sound strategy for active selection of perceptual actions in the context of a visual sensor and a topological map; ii) Real time operation using a modified version of the particle filter and Monte Carlo based estimations; iii) Implementation and testing of these ideas using simulations and a real case scenario. Our results indicate that, in terms of accuracy of robot localization, the proposed approach decreases mean average error and standard deviation with respect to a passive perception scheme. Furthermore, in terms of efficiency, the active scheme is able to operate in real time without adding a relevant overhead to the regular robot operation. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Intell-Robots-2010.pdf} }
Localization is a key issue for a mobile robot, in particular in environments where a globally accurate positioning system, such as GPS, is not available. In these environments, accurate and efficient robot localization is not a trivial task, as an increase in accuracy usually leads to an impoverishment in efficiency and viceversa. Active perception appears as an appealing way to improve the localization process by increasing the richness of the information acquired from the environment. In this paper, we present an active perception strategy for a mobile robot provided with a visual sensor mounted on a pan-tilt mechanism. The visual sensor has a limited field of view, so the goal of the active perception strategy is to use the pan-tilt unit to direct the sensor to informative parts of the environment. To achieve this goal, we use a topological map of the environment and a Bayesian non-parametric estimation of robot position based on a particle filter. We slightly modify the regular implementation of this filter by including an additional step that selects the best perceptual action using Monte Carlo estimations. We understand the best perceptual action as the one that produces the greatest reduction in uncertainty about the robot position. We also consider in our optimization function a cost term that favors efficient perceptual actions. Previous works have proposed active perception strategies for robot localization, but mainly in the context of range sensors, grid representations of the environment, and parametric techniques, such as the extended Kalman filter. Accordingly, the main contributions of this work are: i) Development of a sound strategy for active selection of perceptual actions in the context of a visual sensor and a topological map; ii) Real time operation using a modified version of the particle filter and Monte Carlo based estimations; iii) Implementation and testing of these ideas using simulations and a real case scenario. Our results indicate that, in terms of accuracy of robot localization, the proposed approach decreases mean average error and standard deviation with respect to a passive perception scheme. Furthermore, in terms of efficiency, the active scheme is able to operate in real time without adding a relevant overhead to the regular robot operation.
Indoor Scene Recognition Through Object Detection.
Espinace, P.; Kollar, T.; Soto, A.; and Roy, N.
In Proc. of IEEE Int. Conf. on Robotics and Automation (ICRA), 2010.
Paper
link
bibtex
abstract
@inproceedings{Espinace:EtAl:2010, Author = {P. Espinace and T. Kollar and A. Soto and N. Roy}, Title = {Indoor Scene Recognition Through Object Detection}, booktitle = {Proc. of IEEE Int. Conf. on Robotics and Automation (ICRA)}, year = {2010}, abstract = {Scene recognition is a highly valuable percep- tual ability for an indoor mobile robot, however, current approaches for scene recognition present a significant drop in performance for the case of indoor scenes. We believe that this can be explained by the high appearance variability of indoor environments. This stresses the need to include high- level semantic information in the recognition process. In this work we propose a new approach for indoor scene recognition based on a generative probabilistic hierarchical model that uses common objects as an intermediate semantic representation. Under this model, we use object classifiers to associate low- level visual features to objects, and at the same time, we use contextual relations to associate objects to scenes. As a further contribution, we improve the performance of current state-of- the-art category-level object classifiers by including geometrical information obtained from a 3D range sensor that facilitates the implementation of a focus of attention mechanism within a Monte Carlo sampling scheme. We test our approach using real data, showing significant advantages with respect to previous state-of-the-art methods. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Icra-2010.pdf} }
Scene recognition is a highly valuable percep- tual ability for an indoor mobile robot, however, current approaches for scene recognition present a significant drop in performance for the case of indoor scenes. We believe that this can be explained by the high appearance variability of indoor environments. This stresses the need to include high- level semantic information in the recognition process. In this work we propose a new approach for indoor scene recognition based on a generative probabilistic hierarchical model that uses common objects as an intermediate semantic representation. Under this model, we use object classifiers to associate low- level visual features to objects, and at the same time, we use contextual relations to associate objects to scenes. As a further contribution, we improve the performance of current state-of- the-art category-level object classifiers by including geometrical information obtained from a 3D range sensor that facilitates the implementation of a focus of attention mechanism within a Monte Carlo sampling scheme. We test our approach using real data, showing significant advantages with respect to previous state-of-the-art methods.
Human Detection Using a Mobile Platform and Novel Features Derived From a Visual Saliency Mechanism.
Montabone, S.; and Soto, A.
Image and Vision Computing, 28(3): 391-402. 2010.
Paper
link
bibtex
abstract
@article{Montabone:EtAl:2010, Author = {S. Montabone and A. Soto}, Title = {Human Detection Using a Mobile Platform and Novel Features Derived From a Visual Saliency Mechanism}, Journal = {Image and Vision Computing}, Volume = {28}, Number = {3}, Pages = {391-402}, Year = {2010}, abstract = {Human detection is a key ability to an increasing number of applications that operates in human inhab- ited environments or needs to interact with a human user. Currently, most successful approaches to human detection are based on background substraction techniques that apply only to the case of static cameras or cameras with highly constrained motions. Furthermore, many applications rely on features derived from specific human poses, such as systems based on features derived from the human face which is only visible when a person is facing the detecting camera. In this work, we present a new com- puter vision algorithm designed to operate with moving cameras and to detect humans in different poses under partial or complete view of the human body. We follow a standard pattern recognition approach based on four main steps: (i) preprocessing to achieve color constancy and stereo pair calibration, (ii) seg- mentation using depth continuity information, (iii) feature extraction based on visual saliency, and (iv) classification using a neural network. The main novelty of our approach lies in the feature extraction step, where we propose novel features derived from a visual saliency mechanism. In contrast to previous works, we do not use a pyramidal decomposition to run the saliency algorithm, but we implement this at the original image resolution using the so-called integral image. Our results indicate that our method: (i) outperforms state-of-the-art techniques for human detection based on face detectors, (ii) outperforms state-of-the-art techniques for complete human body detection based on different set of visual features, and (iii) operates in real time onboard a mobile platform, such as a mobile robot (15 fps). }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/ImageVisionComp-10.pdf} } %***********2009 and older***************%
Human detection is a key ability to an increasing number of applications that operates in human inhab- ited environments or needs to interact with a human user. Currently, most successful approaches to human detection are based on background substraction techniques that apply only to the case of static cameras or cameras with highly constrained motions. Furthermore, many applications rely on features derived from specific human poses, such as systems based on features derived from the human face which is only visible when a person is facing the detecting camera. In this work, we present a new com- puter vision algorithm designed to operate with moving cameras and to detect humans in different poses under partial or complete view of the human body. We follow a standard pattern recognition approach based on four main steps: (i) preprocessing to achieve color constancy and stereo pair calibration, (ii) seg- mentation using depth continuity information, (iii) feature extraction based on visual saliency, and (iv) classification using a neural network. The main novelty of our approach lies in the feature extraction step, where we propose novel features derived from a visual saliency mechanism. In contrast to previous works, we do not use a pyramidal decomposition to run the saliency algorithm, but we implement this at the original image resolution using the so-called integral image. Our results indicate that our method: (i) outperforms state-of-the-art techniques for human detection based on face detectors, (ii) outperforms state-of-the-art techniques for complete human body detection based on different set of visual features, and (iii) operates in real time onboard a mobile platform, such as a mobile robot (15 fps).
2009
(4)
Collaborative Robotic Instruction: A Graph Teaching Experience.
Mitnik, R.; Recabarren, M.; Nussbaum, M.; and Soto, A.
Computers and Education, 53(2): 330-342. 2009.
Paper
link
bibtex
abstract
@article{Mitnik:EtAl:2009, Author = {R. Mitnik and M. Recabarren and M. Nussbaum and A. Soto}, Title = {Collaborative Robotic Instruction: A Graph Teaching Experience}, Journal = {Computers and Education}, Volume = {53}, Number = {2}, Pages = {330-342}, Year = {2009}, abstract = {Graphing is a key skill in the study of Physics. Drawing and interpreting graphs play a key role in the understanding of science, while the lack of these has proved to be a handicap and a limiting factor in the learning of scientific concepts. It has been observed that despite the amount of previous graph-work- ing experience, students of all ages experience a series of difficulties when trying to comprehend graphs or when trying to relate them with physical concepts such as position, velocity and acceleration. Several computational tools have risen to improve the students’ understanding of kinematical graphs; however, these approaches fail to develop graph construction skills. On the other hand, Robots have opened new opportunities in learning. Nevertheless, most of their educational applications focus on Robotics related subjects, such as robot programming, robot construction, and artificial intelligence. This paper describes a robotic activity based on face-to-face computer supported collaborative learning. By means of a set of handhelds and a robot wirelessly interconnected, the aim of the activity is to develop graph construction and graph interpretation skills while also reinforcing kinematics concepts. Results show that students using the robotic activity achieve a significant increase in their graph interpreting skills. Moreover, when compared with a similar computer-simulated activity, it proved to be almost twice as effective. Finally, the robotic application proved to be a highly motivating activity for the students, fostering collaboration among them.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Mitnik_2009_Computers-&-Education.pdf} }
Graphing is a key skill in the study of Physics. Drawing and interpreting graphs play a key role in the understanding of science, while the lack of these has proved to be a handicap and a limiting factor in the learning of scientific concepts. It has been observed that despite the amount of previous graph-work- ing experience, students of all ages experience a series of difficulties when trying to comprehend graphs or when trying to relate them with physical concepts such as position, velocity and acceleration. Several computational tools have risen to improve the students’ understanding of kinematical graphs; however, these approaches fail to develop graph construction skills. On the other hand, Robots have opened new opportunities in learning. Nevertheless, most of their educational applications focus on Robotics related subjects, such as robot programming, robot construction, and artificial intelligence. This paper describes a robotic activity based on face-to-face computer supported collaborative learning. By means of a set of handhelds and a robot wirelessly interconnected, the aim of the activity is to develop graph construction and graph interpretation skills while also reinforcing kinematics concepts. Results show that students using the robotic activity achieve a significant increase in their graph interpreting skills. Moreover, when compared with a similar computer-simulated activity, it proved to be almost twice as effective. Finally, the robotic application proved to be a highly motivating activity for the students, fostering collaboration among them.
Performance Evaluation of the Covariance Descriptor for Target Detection.
Cortez-Cargill, P.; Undurraga-Rius, C.; Mery, D.; and Soto, A.
In Proc. of XXVIII Int. Conf. of the Chilean Computer Science Society/IEEE CS Press, 2009.
Paper
link
bibtex
abstract
@inproceedings{Cortez:EtAl:2009, Author = {P. Cortez-Cargill and C. Undurraga-Rius and D. Mery and A. Soto}, Title = {Performance Evaluation of the Covariance Descriptor for Target Detection}, booktitle = {Proc. of XXVIII Int. Conf. of the Chilean Computer Science Society/IEEE CS Press}, year = {2009}, abstract = {In computer vision, there has been a strong advance in creating new image descriptors. A descriptor that has recently appeared is the Covariance Descriptor, but there have not been any studies about the different methodologies for its construction. To address this problem we have conducted an analysis on the contribution of diverse features of an image to the descriptor and therefore their contribution to the detection of varied targets, in our case: faces and pedestrians. That is why we have defined a methodology to determinate the performance of the covariance matrix created from different characteristics. Now we are able to determinate the best set of features for face and people detection, for each problem. We have also achieved to establish that not any kind of combination of features can be used because it might not exist a correlation between them. Finally, when an analysis is performed with the best set of features, for the face detection problem we reach a performance of 99%, meanwhile for the pedestrian detection problem we reach a performance of 85%. With this we hope we have built a more solid base when choosing features for this descriptor, allowing to move forward to other topics such as object recognition or tracking. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Final-Proceedings-Cortez_Undurraga_Mery_Soto_SCCC2009.pdf} }
In computer vision, there has been a strong advance in creating new image descriptors. A descriptor that has recently appeared is the Covariance Descriptor, but there have not been any studies about the different methodologies for its construction. To address this problem we have conducted an analysis on the contribution of diverse features of an image to the descriptor and therefore their contribution to the detection of varied targets, in our case: faces and pedestrians. That is why we have defined a methodology to determinate the performance of the covariance matrix created from different characteristics. Now we are able to determinate the best set of features for face and people detection, for each problem. We have also achieved to establish that not any kind of combination of features can be used because it might not exist a correlation between them. Finally, when an analysis is performed with the best set of features, for the face detection problem we reach a performance of 99%, meanwhile for the pedestrian detection problem we reach a performance of 85%. With this we hope we have built a more solid base when choosing features for this descriptor, allowing to move forward to other topics such as object recognition or tracking.
An ensemble of Discriminative Local Subspaces in Microarray Data for Gene Ontology Annotation Predictions.
Puelma, T.; Soto, A.; and Gutierrez, R.
In Proc. of 1st Chilean Workshop on Pattern Recognition (CWPR), pages 52-61, 2009.
Paper
link
bibtex
abstract
@inproceedings{Puelma:EtAl:2009, Author = {T. Puelma and A. Soto and R. Gutierrez}, Title = {An ensemble of Discriminative Local Subspaces in Microarray Data for Gene Ontology Annotation Predictions}, booktitle = {Proc. of 1st Chilean Workshop on Pattern Recognition (CWPR)}, pages = {52-61}, year = {2009}, abstract = {Genome sequencing has allowed to know almost every gene of many organisms. However, understanding the functions of most genes is still an open problem. In this paper, we present a novel machine learning method to predict functions of unknown genes in base of gene expression data and Gene Ontology annotations. Most function prediction al- gorithms developed in the past don’t exploit the discriminative power of supervised learning. In contrast, our method uses this to find discriminative local subspaces that are suitable to perform gene functional prediction. Cross-validation test are done in artificial and real data and compared with a state-of- the-art method. Preliminary results shows that in overall, our method outperforms the other approach in terms of precision and recall, giving insights in the importance of a good selection of discriminative experiments.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/DLS-Final-v2.pdf} }
Genome sequencing has allowed to know almost every gene of many organisms. However, understanding the functions of most genes is still an open problem. In this paper, we present a novel machine learning method to predict functions of unknown genes in base of gene expression data and Gene Ontology annotations. Most function prediction al- gorithms developed in the past don’t exploit the discriminative power of supervised learning. In contrast, our method uses this to find discriminative local subspaces that are suitable to perform gene functional prediction. Cross-validation test are done in artificial and real data and compared with a state-of- the-art method. Preliminary results shows that in overall, our method outperforms the other approach in terms of precision and recall, giving insights in the importance of a good selection of discriminative experiments.
Face Recognition with Local Binary Patterns, Spatial Pyramid Histograms and Naive Bayes Nearest Neighbor classification.
Maturana, D.; Mery, D.; and Soto, A.
In Proc. of XXVIII Int. Conf. of the Chilean Computer Science Society/IEEE CS Press, 2009.
Paper
link
bibtex
abstract
@inproceedings{Maturana:EtAl:2009, Author = {D. Maturana and D. Mery and A. Soto}, Title = {Face Recognition with Local Binary Patterns, Spatial Pyramid Histograms and Naive Bayes Nearest Neighbor classification}, booktitle = {Proc. of XXVIII Int. Conf. of the Chilean Computer Science Society/IEEE CS Press}, year = {2009}, abstract = {Face recognition algorithms commonly assume that face images are well aligned and have a similar pose – yet in many practical applications it is impossible to meet these conditions. Therefore extending face recognition to un- constrained face images has become an active area of research. To this end, histograms of Local Binary Patterns (LBP) have proven to be highly discriminative descriptors for face recognition. Nonetheless, most LBP-based algorithms use a rigid descriptor matching strategy that is not robust against pose variation and misalignment. We propose two algorithms for face recognition that are de- signed to deal with pose variations and misalignment. We also incorporate an illumination normalization step that increases robustness against lighting variations. The proposed algorithms use descriptors based on histograms of LBP and perform descriptor matching with spatial pyramid matching (SPM) and Naive Bayes Nearest Neighbor (NBNN), respectively. Our con- tribution is the inclusion of flexible spatial matching schemes that use an image-to-class relation to provide an improved robustness with respect to intra-class variations. We compare the accuracy of the proposed algorithms against Ahonen’s original LBP-based face recognition system and two baseline holistic classifiers on four standard datasets. Our results indicate that the algorithm based on NBNN outperforms the other solutions, and does so more markedly in presence of pose variations. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Final-Daniel-09.pdf} } %***********2009 and older***************%
Face recognition algorithms commonly assume that face images are well aligned and have a similar pose – yet in many practical applications it is impossible to meet these conditions. Therefore extending face recognition to un- constrained face images has become an active area of research. To this end, histograms of Local Binary Patterns (LBP) have proven to be highly discriminative descriptors for face recognition. Nonetheless, most LBP-based algorithms use a rigid descriptor matching strategy that is not robust against pose variation and misalignment. We propose two algorithms for face recognition that are de- signed to deal with pose variations and misalignment. We also incorporate an illumination normalization step that increases robustness against lighting variations. The proposed algorithms use descriptors based on histograms of LBP and perform descriptor matching with spatial pyramid matching (SPM) and Naive Bayes Nearest Neighbor (NBNN), respectively. Our con- tribution is the inclusion of flexible spatial matching schemes that use an image-to-class relation to provide an improved robustness with respect to intra-class variations. We compare the accuracy of the proposed algorithms against Ahonen’s original LBP-based face recognition system and two baseline holistic classifiers on four standard datasets. Our results indicate that the algorithm based on NBNN outperforms the other solutions, and does so more markedly in presence of pose variations.
2008
(8)
Detection of Anomalies in Large Datasets Using an Active Learning Scheme Based on Dirichlet Distributions.
Pichara, K.; Soto, A.; and Araneda, A.
In Advances in Artificial Intelligence, Iberamia-08, LNCS 5290, pages 163-172, 2008.
Paper
link
bibtex
abstract
@inproceedings{Pichara:EtAl:2008, Author = {K. Pichara and A. Soto and A. Araneda}, Title = {Detection of Anomalies in Large Datasets Using an Active Learning Scheme Based on Dirichlet Distributions}, booktitle = {Advances in Artificial Intelligence, Iberamia-08, LNCS 5290}, pages = {163-172}, year = {2008}, abstract = {Today, the detection of anomalous records is a highly valu- able application in the analysis of current huge datasets. In this paper we propose a new algorithm that, with the help of a human expert, effi- ciently explores a dataset with the goal of detecting relevant anomalous records. Under this scheme the computer selectively asks the expert for data labeling, looking for relevant semantic feedback in order to improve its knowledge about what characterizes a relevant anomaly. Our ratio- nale is that while computers can process huge amounts of low level data, an expert has high level semantic knowledge to efficiently lead the search. We build upon our previous work based on Bayesian networks that pro- vides an initial set of potential anomalies. In this paper, we augment this approach with an active learning scheme based on the clustering proper- ties of Dirichlet distributions. We test the performance of our algorithm using synthetic and real datasets. Our results indicate that, under noisy data and anomalies presenting regular patterns, our approach signifi- cantly reduces the rate of false positives, while decreasing the time to reach the relevant anomalies. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/ActiveLearning.pdf} }
Today, the detection of anomalous records is a highly valu- able application in the analysis of current huge datasets. In this paper we propose a new algorithm that, with the help of a human expert, effi- ciently explores a dataset with the goal of detecting relevant anomalous records. Under this scheme the computer selectively asks the expert for data labeling, looking for relevant semantic feedback in order to improve its knowledge about what characterizes a relevant anomaly. Our ratio- nale is that while computers can process huge amounts of low level data, an expert has high level semantic knowledge to efficiently lead the search. We build upon our previous work based on Bayesian networks that pro- vides an initial set of potential anomalies. In this paper, we augment this approach with an active learning scheme based on the clustering proper- ties of Dirichlet distributions. We test the performance of our algorithm using synthetic and real datasets. Our results indicate that, under noisy data and anomalies presenting regular patterns, our approach signifi- cantly reduces the rate of false positives, while decreasing the time to reach the relevant anomalies.
An autonomous educational mobile robot mediator.
Mitnik, R.; Nussbaum, M.; and Soto, A.
Autonomous Robots, 25(4): 367-382. 2008.
Paper
link
bibtex
abstract
@article{Mitnik:EtAl:2008, Author = {R. Mitnik and M. Nussbaum and A. Soto}, Title = {An autonomous educational mobile robot mediator}, Journal = {Autonomous Robots}, Volume = {25}, Number = {4}, Pages = {367-382}, Year = {2008}, abstract = {So far, most of the applications of robotic technology to education have mainly focused on sup- porting the teaching of subjects that are closely related to the Robotics field, such as robot programming, robot construction, or mechatronics. Moreover, most of the applications have used the robot as an end or a passive tool of the learning activity, where the robot has been constructed or programmed. In this paper, we present a novel application of robotic technologies to education, where we use the real world situatedness of a robot to teach non-robotic related subjects, such as math and physics. Furthermore, we also provide the robot with a suitable degree of autonomy to actively guide and mediate in the development of the educational activ- ity. We present our approach as an educational frame- work based on a collaborative and constructivist learn- ing environment, where the robot is able to act as an interaction mediator capable of managing the interac- tions occurring among the working students. We illus- trate the use of this framework by a 4-step methodology that is used to implement two educational activities. These activities were tested at local schools with en- couraging results. Accordingly, the main contributions of this work are: i) A novel use of a mobile robot to illustrate and teach relevant concepts and properties of the real world; ii) A novel use of robots as mediators that autonomously guide an educational activity using a collaborative and constructivist learning approach; iii) The implementation and testing of these ideas in a real scenario, working with students at local schools. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/EducationalRobot.pdf} }
So far, most of the applications of robotic technology to education have mainly focused on sup- porting the teaching of subjects that are closely related to the Robotics field, such as robot programming, robot construction, or mechatronics. Moreover, most of the applications have used the robot as an end or a passive tool of the learning activity, where the robot has been constructed or programmed. In this paper, we present a novel application of robotic technologies to education, where we use the real world situatedness of a robot to teach non-robotic related subjects, such as math and physics. Furthermore, we also provide the robot with a suitable degree of autonomy to actively guide and mediate in the development of the educational activ- ity. We present our approach as an educational frame- work based on a collaborative and constructivist learn- ing environment, where the robot is able to act as an interaction mediator capable of managing the interac- tions occurring among the working students. We illus- trate the use of this framework by a 4-step methodology that is used to implement two educational activities. These activities were tested at local schools with en- couraging results. Accordingly, the main contributions of this work are: i) A novel use of a mobile robot to illustrate and teach relevant concepts and properties of the real world; ii) A novel use of robots as mediators that autonomously guide an educational activity using a collaborative and constructivist learning approach; iii) The implementation and testing of these ideas in a real scenario, working with students at local schools.
Unsupervised Identification of Useful Visual Landmarks Using Multiple Segmentations and Top-Down Feedback.
Espinace, P.; Langdon, D.; and Soto, A.
Robotics and Autonomous Systems, 56(6): 538-548. 2008.
Paper
link
bibtex
abstract
@article{Espinace:Soto:2008a, Author = {P. Espinace and D. Langdon and A. Soto}, Title = {Unsupervised Identification of Useful Visual Landmarks Using Multiple Segmentations and Top-Down Feedback}, Journal = {Robotics and Autonomous Systems}, Volume = {56}, Number = {6}, Pages = {538-548}, Year = {2008}, abstract = {In this paper, we tackle the problem of unsupervised selection and posterior recognition of visual landmarks in images sequences acquired by an indoor mobile robot. This is a highly valuable perceptual capability for a wide variety of robotic applications, in particular autonomous navigation. Our method combines a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. As there is no segmentation method that works properly in every situation, we integrate multiple segmentation algorithms in order to increase the robustness of the approach. In terms of the top-down feedback, this is provided by two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of each segmentation algorithm in the recognition of each landmark. We test our approach with encouraging results in three datasets corresponding to real-world scenarios. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Pablo-RAS-08.pdf} }
In this paper, we tackle the problem of unsupervised selection and posterior recognition of visual landmarks in images sequences acquired by an indoor mobile robot. This is a highly valuable perceptual capability for a wide variety of robotic applications, in particular autonomous navigation. Our method combines a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. As there is no segmentation method that works properly in every situation, we integrate multiple segmentation algorithms in order to increase the robustness of the approach. In terms of the top-down feedback, this is provided by two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of each segmentation algorithm in the recognition of each landmark. We test our approach with encouraging results in three datasets corresponding to real-world scenarios.
Unsupervised Anomaly Detection in Large Databases Using Bayesian Networks.
Cansado, A.; and Soto, A.
Applied Artificial Intelligence, 22(4): 309-330. 2008.
Paper
link
bibtex
abstract
@article{Cansado:Soto:2008, Author = {A. Cansado and A. Soto}, Title = {Unsupervised Anomaly Detection in Large Databases Using Bayesian Networks}, Journal = {Applied Artificial Intelligence}, Volume = {22}, Number = {4}, Pages = {309-330}, Year = {2008}, abstract = {Today, there has been a massive proliferation of huge databases storing valuable information. The opportunities of an effective use of these new data sources are enormous, however, the huge size and dimensionality of current large databases call for new ideas to scale up current statistical and computational approaches. This paper presents an application of Ar- tificial Intelligence technology to the problem of automatic detection of candidate anomalous records in a large database. We build our approach with three main goals in mind: 1)An effective detection of the records that are potentially anomalous, 2)A suitable selection of the subset of at- tributes that explains what makes a record anomalous, and 3)An efficient implementation that allows us to scale the approach to large databases. Our algorithm, called Bayesian Network Anomaly Detector (BNAD), uses the joint probability density function (pdf) provided by a Bayesian Net- work (BN) to achieve these goals. By using appropriate data structures, advanced caching techniques, the flexibility of Gaussian Mixture models, and the efficiency of BNs to model joint pdfs, BNAD manages to effi- ciently learn a suitable BN from a large dataset. We test BNAD using synthetic and real databases, the latter from the fields of manufacturing and astronomy, obtaining encouraging results. }, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Cansado-Soto-AAI-2007.pdf} }
Today, there has been a massive proliferation of huge databases storing valuable information. The opportunities of an effective use of these new data sources are enormous, however, the huge size and dimensionality of current large databases call for new ideas to scale up current statistical and computational approaches. This paper presents an application of Ar- tificial Intelligence technology to the problem of automatic detection of candidate anomalous records in a large database. We build our approach with three main goals in mind: 1)An effective detection of the records that are potentially anomalous, 2)A suitable selection of the subset of at- tributes that explains what makes a record anomalous, and 3)An efficient implementation that allows us to scale the approach to large databases. Our algorithm, called Bayesian Network Anomaly Detector (BNAD), uses the joint probability density function (pdf) provided by a Bayesian Net- work (BN) to achieve these goals. By using appropriate data structures, advanced caching techniques, the flexibility of Gaussian Mixture models, and the efficiency of BNs to model joint pdfs, BNAD manages to effi- ciently learn a suitable BN from a large dataset. We test BNAD using synthetic and real databases, the latter from the fields of manufacturing and astronomy, obtaining encouraging results.
Real-Time Robot Localization In Indoor Environments Using Structural Information.
Espinace, P.; Soto, A.; and Torres-Torriti, M.
In IEEE Latin American Robotics Symposium (LARS), 2008.
Paper
link
bibtex
@inproceedings{Espinace:Soto:Torres:2008, Author = {P. Espinace and A. Soto and M. Torres-Torriti}, Title = {Real-Time Robot Localization In Indoor Environments Using Structural Information}, booktitle = {IEEE Latin American Robotics Symposium (LARS)}, pages = {}, year = {2008}, abstract = {}, url = {} }
Improving the Selection and Detection of Visual Landmarks Through Object Tracking.
Espinace, P.; and Soto, A.
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Workshop on Visual Localization for Mobile Platforms, 2008.
Paper
link
bibtex
@inproceedings{Espinace:Soto:2008b, Author = {P. Espinace and A. Soto}, Title = {Improving the Selection and Detection of Visual Landmarks Through Object Tracking}, booktitle = {IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Workshop on Visual Localization for Mobile Platforms}, pages = {}, year = {2008}, abstract = {}, url = {} }
Human Detection in Indoor Environments Using Multiple Visual Cues and a Mobile Robot.
Pszczolkowski, S.; and Soto, A.
In Iberoamerican Congress on Pattern Recognition (CIARP), LNCS 4756, pages 350-359, 2008.
Paper
link
bibtex
@inproceedings{Pszczolkowski:Soto:2008, Author = {S. Pszczolkowski and A. Soto}, Title = {Human Detection in Indoor Environments Using Multiple Visual Cues and a Mobile Robot}, booktitle = {Iberoamerican Congress on Pattern Recognition (CIARP), LNCS 4756}, pages = {350-359}, year = {2008}, abstract = {}, url = {} }
Features: The more the better.
D. Mery, undefined; and Soto, A.
In The 7th WSEAS International Conference on Signal Processing, Computational Geometry and Artificial Vision, 2008.
link bibtex
link bibtex
@inproceedings{Mery:Soto:2008, author = {D. Mery, and A. Soto}, booktitle = {The 7th WSEAS International Conference on Signal Processing, Computational Geometry and Artificial Vision}, Title = {Features: The more the better}, Year = {2008} }
2007
(4)
A Statistical approach to simultaneous mapping and localization for mobile robots.
Araneda, A.; Fienberg, S.; and Soto, A.
The Annals of Applied Statistics, 1(1): 66-84. 2007.
Paper
link
bibtex
@article{Araneda:Fienberg:Soto:2007, Author = {A. Araneda and S. Fienberg and A. Soto}, Title = {A Statistical approach to simultaneous mapping and localization for mobile robots}, journal = {The Annals of Applied Statistics}, volume = {1}, number = {1}, pages = {66-84}, year = {2007}, abstract = {}, url = {} }
Using Data Mining Techniques to Predict Industrial Wine Problem Fermentations.
Urtubia; Perez-Correa, J. R.; Soto, A.; and Pszczolkowski, P.
Food Control, 18(12): 1512-1517. 2007.
Paper
link
bibtex
@article{Urtubia:EtAl:2008, Author = {Urtubia and J. R. Perez-Correa and A. Soto and P. Pszczolkowski}, Title = {Using Data Mining Techniques to Predict Industrial Wine Problem Fermentations}, journal = {Food Control}, volume = {18}, number = {12}, pages = {1512-1517}, year = {2007}, abstract = {}, url = {} }
An Accelerated Algorithm for Density Estimation in Large Databases, Using Gaussian Mixtures.
Soto, A.; Zavala, F.; and Araneda, A.
Cybernetics and Systems: An International Journal, 38(2): 123-139. 2007.
Paper
link
bibtex
abstract
@article{Soto:Zavala:Araneda:2007, Author = {A. Soto and F. Zavala and A. Araneda}, Title = {An Accelerated Algorithm for Density Estimation in Large Databases, Using Gaussian Mixtures}, journal = {Cybernetics and Systems: An International Journal}, volume = {38}, number = {2}, pages = {123-139}, year = {2007}, abstract = {Today, with the advances of computer storage and technology, there are huge datasets available, offering an opportunity to extract valuable information. Probabilistic approaches are specially suited to learn from data by representing knowledge as density functions. In this paper, we choose Gaussian Mixture Models (GMMs) to represent densities, as they possess great flexibility to adequate to a wide class of problems. The classical estimation approach for GMMs corresponds to the iterative algorithm of Expectation Maximization. This approach, however, does not scale properly to meet the high demanding processing requirements of large databases. In this paper we introduce an EM-based algorithm, that solves the scalability problem. Our approach is based on the concept of data condensation which, in addition to substantially diminishing the computational load, provides sound starting values that allow the algorithm to reach convergence faster. We also focus on the model selection problem. We test our algorithm using synthetic and real databases, and find several advantages, when compared to other standard existing procedures.}, url = {http://saturno.ing.puc.cl/media/papers_alvaro/Felipe-07.pdf} }
Today, with the advances of computer storage and technology, there are huge datasets available, offering an opportunity to extract valuable information. Probabilistic approaches are specially suited to learn from data by representing knowledge as density functions. In this paper, we choose Gaussian Mixture Models (GMMs) to represent densities, as they possess great flexibility to adequate to a wide class of problems. The classical estimation approach for GMMs corresponds to the iterative algorithm of Expectation Maximization. This approach, however, does not scale properly to meet the high demanding processing requirements of large databases. In this paper we introduce an EM-based algorithm, that solves the scalability problem. Our approach is based on the concept of data condensation which, in addition to substantially diminishing the computational load, provides sound starting values that allow the algorithm to reach convergence faster. We also focus on the model selection problem. We test our algorithm using synthetic and real databases, and find several advantages, when compared to other standard existing procedures.
Computer Vision for Quality Control in Latin American Food Industry, A Case Study.
Aguilera, J.; Cipriano, A.; Erana, M.; Lillo, I.; Mery, D.; Soto, A.; and Valdivieso, C.
In Int. Conf. on Computer Vision (ICCV): Workshop on Computer Vision Applications for Developing Countries, 2007.
Paper
link
bibtex
@inproceedings{Aguilera:EtAt:2007, Author = {JM. Aguilera and A. Cipriano and M. Erana and I. Lillo and D. Mery and A. Soto and C. Valdivieso}, Title = {Computer Vision for Quality Control in Latin American Food Industry, A Case Study}, booktitle = {Int. Conf. on Computer Vision (ICCV): Workshop on Computer Vision Applications for Developing Countries}, pages = {}, year = {2007}, abstract = {}, url = {} }
2006
(2)
Automatic Selection and Detection of Visual Landmarks Using Multiple Segmentations.
Langdon, D.; Soto, A.; and Mery, D.
In IEEE Pacific-Rim Symposium on Image and Video Technology (PSIVT), LNCS 4319, pages 601-610, 2006.
Paper
link
bibtex
@inproceedings{Langdon:Soto:Mery:2006, Author = {D. Langdon and A. Soto and D. Mery}, Title = {Automatic Selection and Detection of Visual Landmarks Using Multiple Segmentations}, booktitle = {IEEE Pacific-Rim Symposium on Image and Video Technology (PSIVT), LNCS 4319}, pages = {601-610}, year = {2006}, abstract = {}, url = {} }
A Mobile Robotics Course for Undergraduate Students in Computer Science.
Soto, A.; Espinace, P.; and Mitnik, R.
In IEEE Latin American Robotics Symposium, LARS, pages 187-192, 2006.
Paper
link
bibtex
@inproceedings{Soto:Espinace:Mitnik:2006, Author = {A. Soto and P. Espinace and R. Mitnik}, Title = {A Mobile Robotics Course for Undergraduate Students in Computer Science}, booktitle = {IEEE Latin American Robotics Symposium, LARS}, pages = {187-192}, year = {2006}, abstract = {}, url = {} }
2005
(2)
Important Sampling in Mapping and Localization by a Mobile Robot.
Araneda, A.; and Soto, A.
In Workshop on Case Studies of Bayesian Statistics, 2005.
Paper
link
bibtex
@inproceedings{Araneda:Soto:2005, Author = {A. Araneda and A. Soto}, Title = {Important Sampling in Mapping and Localization by a Mobile Robot}, booktitle = {Workshop on Case Studies of Bayesian Statistics}, year = {2005}, abstract = {}, url = {} }
Self Adaptive Particle Filter.
Soto, A.
In Proceedings of International Join Conference on Artificial Intelligence (IJCAI), pages 1398-1406, 2005.
Paper
link
bibtex
abstract
@inproceedings{Soto:2005, Author = {A. Soto}, Title = {Self Adaptive Particle Filter}, booktitle = {Proceedings of International Join Conference on Artificial Intelligence (IJCAI)}, pages = {1398-1406}, year = {2005}, abstract = {The particle filter has emerged as a useful tool for problems requiring dynamic state estimation. The efficiency and accuracy of the filter depend mostly on the number of particles used in the estimation and on the propagation function used to re-allocate these particles at each iteration. Both features are specified beforehand and are kept fixed in the reg- ular implementation of the filter. In practice this may be highly inappropriate since it ignores errors in the models and the varying dynamics of the pro- cesses. This work presents a self adaptive version of the particle filter that uses statistical methods to adapt the number of particles and the propagation function at each iteration. Furthermore, our method presents similar computational load than the stan- dard particle filter. We show the advantages of the self adaptive filter by applying it to a synthetic ex- ample and to the visual tracking of targets in a real video sequence. }, url = {Soto-IJCAI-0 5.pdf} }
The particle filter has emerged as a useful tool for problems requiring dynamic state estimation. The efficiency and accuracy of the filter depend mostly on the number of particles used in the estimation and on the propagation function used to re-allocate these particles at each iteration. Both features are specified beforehand and are kept fixed in the reg- ular implementation of the filter. In practice this may be highly inappropriate since it ignores errors in the models and the varying dynamics of the pro- cesses. This work presents a self adaptive version of the particle filter that uses statistical methods to adapt the number of particles and the propagation function at each iteration. Furthermore, our method presents similar computational load than the stan- dard particle filter. We show the advantages of the self adaptive filter by applying it to a synthetic ex- ample and to the visual tracking of targets in a real video sequence.
2004
(4)
Statistical Inference in Mapping and Localization for Mobile Robots.
Araneda, A.; and A. Soto, undefined
In Advances in Artificial Intelligence, Iberamia-04, LNAI 3315, pages 545-554, 2004.
Paper
link
bibtex
@inproceedings{Araneda:Soto:2004, Author = {A. Araneda and A. Soto,}, Title = {Statistical Inference in Mapping and Localization for Mobile Robots}, booktitle = {Advances in Artificial Intelligence, Iberamia-04, LNAI 3315}, pages = {545-554}, year = {2004}, abstract = {}, url = {} }
Mobile Robotic Supported Collaborative Learning (MRSCL).
Mitnik, R.; Nussbaum, M.; and Soto, A.
In Advances in Artificial Intelligence, Iberamia-04, LNAI 3315, pages 912-921, 2004.
Paper
link
bibtex
@inproceedings{Mitnik:Nussbaum:Soto:2004, Author = {R. Mitnik and M. Nussbaum and A. Soto}, Title = {Mobile Robotic Supported Collaborative Learning (MRSCL)}, booktitle = {Advances in Artificial Intelligence, Iberamia-04, LNAI 3315}, pages = {912-921}, year = {2004}, abstract = {}, url = {} }
Detection of Rare Objects in Massive Astrophysical Data Sets Using Innovative Knowledge Discovery Technology.
Soto, A.; A.Cansado; and Zavala, F.
In Astronomical Data Analysis Software & Systems Conf. Series (ADASS), pages 66-72, 2004.
Paper
link
bibtex
@inproceedings{Soto:Cansado:Zavala:2008, Author = {A. Soto and A.Cansado and F. Zavala}, Title = {Detection of Rare Objects in Massive Astrophysical Data Sets Using Innovative Knowledge Discovery Technology}, booktitle = {Astronomical Data Analysis Software & Systems Conf. Series (ADASS)}, pages = {66-72}, year = {2004}, abstract = {}, url = {} }
A Method to Adaptively Propagate the Set of Samples Used by Particle Filters.
Soto, A.
In Lectures Notes in Artificial Intelligence, LNAI 3040, pages 47-56, 2004.
Paper
link
bibtex
@inproceedings{Soto:2004, Author = {A. Soto}, Title = {A Method to Adaptively Propagate the Set of Samples Used by Particle Filters}, booktitle = {Lectures Notes in Artificial Intelligence, LNAI 3040}, pages = {47-56}, year = {2004}, abstract = {}, url = {} }
2003
(4)
Sequential Monte Carlo Methods for the Creation of Adaptive Software.
Soto, A.; and Khosla, P.
In 3th Int. Workshop on Self-adaptive Software, 2003.
Paper
link
bibtex
@inproceedings{Soto:Khosla:2003a, Author = {A. Soto and P. Khosla}, Title = {Sequential Monte Carlo Methods for the Creation of Adaptive Software}, booktitle = {3th Int. Workshop on Self-adaptive Software}, pages = {}, year = {2003}, abstract = {}, url = {} }
A Probabilistic Approach for Dynamic State Estimation Using Visual Information.
Soto, A.; and Khosla, P.
In Lectures Notes in Computer Science, LNCS 2821, pages 421-435, 2003.
Paper
link
bibtex
@inproceedings{Soto:Khosla:2003b, Author = {A. Soto and P. Khosla}, Title = {A Probabilistic Approach for Dynamic State Estimation Using Visual Information}, booktitle = {Lectures Notes in Computer Science, LNCS 2821}, pages = {421-435}, year = {2003}, abstract = {}, url = {} }
Adaptive Agent Based System for State Estimation Using Dynamic Multidimentional Information Sources.
Soto, A.; and Khosla, P.
In Lectures Notes in Computer Science, LNCS 2614, pages 66-83, 2003.
Paper
link
bibtex
@inproceedings{Soto:Khosla:2003c, Author = {A. Soto and P. Khosla}, Title = {Adaptive Agent Based System for State Estimation Using Dynamic Multidimentional Information Sources}, booktitle = {Lectures Notes in Computer Science, LNCS 2614}, pages = {66-83}, year = {2003}, abstract = {}, url = {} }
Probabilistic Adaptive Agent Based System for Dynamic State Estimation Using Multiple Visual Cues.
Soto, A.; and Khosla, P.
In volume 6, pages 559-572, 2003.
Paper
link
bibtex
@inproceedings{Soto:Khosla:2003d, Author = {A. Soto and P. Khosla}, Title = {Probabilistic Adaptive Agent Based System for Dynamic State Estimation Using Multiple Visual Cues}, journal = {Springer Tracts in Advanced Robotics (STAR)}, volume = {6}, pages = {559-572}, year = {2003}, abstract = {}, url = {} }
2002
(2)
A Probabilistic Approach for the Adaptive Integration of Multiple Visual Cues Using an Agent Framework.
Soto, A.
Technical Report PhD. Thesis, Robotics Institute, School of Computer Science, Tech Report CMU-RI-TR-02-30, Carnegie Mellon University, 2002.
Paper
link
bibtex
@techreport{Soto:2002, author = {A. Soto}, title = {A Probabilistic Approach for the Adaptive Integration of Multiple Visual Cues Using an Agent Framework}, number = {PhD. Thesis, Robotics Institute, School of Computer Science, Tech Report CMU-RI-TR-02-30}, institution = {Carnegie Mellon University}, year = {2002}, abstract = {}, url = {} }
Recent Advances in Distributed Tactical Surveillance.
Saptharishi, M.; Bhat, K.; Diehl, C.; Oliver, S.; Savvides, M.; Soto, A.; Dolan, J.; and Khosla, P.
In SPIE on Unattended Ground Sensor Technologies and Applications, Aerosense, 2002.
Paper
link
bibtex
@inproceedings{Saptharishi:EtAl:2002, Author = {M. Saptharishi and K. Bhat and C. Diehl and S. Oliver and M. Savvides and A. Soto and J. Dolan and P. Khosla}, Title = {Recent Advances in Distributed Tactical Surveillance}, booktitle = {SPIE on Unattended Ground Sensor Technologies and Applications, Aerosense}, pages = {}, year = {2002}, abstract = {}, url = {} }
1999
(3)
CyberATVs: Dynamic and Distributed Reconnaissance and Surveillance Using All Terrain UGVs.
Soto, A.; Saptharishi, M.; Dolan, J.; Trebi-Ollennu, A.; and Khosla, P.
In Proceedings of the International Conference on Field and Service Robotics (FSR), 1999.
Paper
link
bibtex
@inproceedings{Soto:EtAl:2002, Author = {A. Soto and M. Saptharishi and J. Dolan and A. Trebi-Ollennu and P. Khosla}, Title = {CyberATVs: Dynamic and Distributed Reconnaissance and Surveillance Using All Terrain UGVs}, booktitle = {Proceedings of the International Conference on Field and Service Robotics (FSR)}, pages = {}, year = {1999}, abstract = {}, url = {} }
An Effective Mobile Robot Educator with a Full-Time Job.
Nourbakhsh, I.; Bobenage, J.; Grange, S.; Lutz, R.; Meyer, R.; and Soto, A.
Artificial Intelligence, 114(1-2): 95-124. 1999.
Paper
link
bibtex
@article{Nourbakhsh:EtAl:1999, Author = {I. Nourbakhsh and J. Bobenage and S. Grange and R. Lutz and R. Meyer and A. Soto}, Title = {An Effective Mobile Robot Educator with a Full-Time Job}, journal = {Artificial Intelligence}, volume = {114}, number = {1-2}, pages = {95-124}, year = {1999}, abstract = {}, url = {} }
Distributed Tactical Surveillance with ATVs.
Dolan, J.; Trebi-Ollennu, A.; Soto, A.; and Khosla, P.
In SPIE on Unattended Ground Sensor Technologies and Applications, Aerosense, Vol. 3693, 1999.
Paper
link
bibtex
@inproceedings{Dolan:EtAl:1999, Author = {J. Dolan and A. Trebi-Ollennu and A. Soto and P. Khosla}, Title = {Distributed Tactical Surveillance with ATVs}, booktitle = {SPIE on Unattended Ground Sensor Technologies and Applications, Aerosense, Vol. 3693}, pages = {}, year = {1999}, abstract = {}, url = {} }
1998
(2)
A Scenario for Planning Visual Navigation of a Mobile Robot.
Soto, A.; and Nourbakhsh, I.
In American Association for Artificial Intelligence (AAAI), Fall Symposium Series, 1998.
Paper
link
bibtex
@inproceedings{Soto:Illah:1998, Author = {A. Soto and I. Nourbakhsh}, Title = {A Scenario for Planning Visual Navigation of a Mobile Robot}, booktitle = {American Association for Artificial Intelligence (AAAI), Fall Symposium Series}, pages = {}, year = {1998}, abstract = {}, url = {} }
A Real Time Visual Sensor for Supervision of Flotations Cells.
Cipriano, A.; Guarini, M.; Vidal, R.; Soto, A.; Sepúlveda, C.; Mery, D.; and Briseño, H.
Minerals Engineering, 11(6): 489-499. 1998.
Paper
link
bibtex
@article{Cipriano:EtAl:1998, Author = {A. Cipriano and M. Guarini and R. Vidal and A. Soto and C. Sepúlveda and D. Mery and H. Briseño}, Title = {A Real Time Visual Sensor for Supervision of Flotations Cells}, journal = {Minerals Engineering}, volume = {11}, number = {6}, pages = {489-499}, year = {1998}, abstract = {}, url = {} }
1997
(1)
Expert supervision of flotation cells using digital image processing.
Cipriano, A.; Guarini, M.; Soto, A.; Briseño, H.; and D. Mery, undefined
In In Proc. of 20th Int. Mineral Processing Congress, pages 281-292, 1997.
Paper
link
bibtex
@inproceedings{Cipriano:Et:Al:1997, Author = {A. Cipriano and M. Guarini and A. Soto and H. Briseño and D. Mery,}, Title = {Expert supervision of flotation cells using digital image processing}, booktitle = {In Proc. of 20th Int. Mineral Processing Congress}, pages = {281-292}, year = {1997}, abstract = {}, url = {} }
1996
(1)
Image processing applied to real time measurement of traffic flow.
Soto, A.; and Cipriano, A.
In In Proc. of 28th Southeastern Symposium on System Theory, 1996.
Paper
link
bibtex
@inproceedings{Soto:Cipriano:1996, Author = {A. Soto and A. Cipriano}, Title = {Image processing applied to real time measurement of traffic flow}, booktitle = {In Proc. of 28th Southeastern Symposium on System Theory}, pages = {}, year = {1996}, abstract = {}, url = {} }
1995
(1)
Measurement of physical characteristics of foam in flotation cells.
Guarini, M.; Soto, A.; Cipriano, A.; Guesalaga, A.; and Caceres, J.
In In Proc. of Int. Conference: Copper-95, 1995.
Paper
link
bibtex
@inproceedings{Guarini:EtAl:1995, Author = {M. Guarini and A. Soto and A. Cipriano and A. Guesalaga and J. Caceres}, Title = {Measurement of physical characteristics of foam in flotation cells}, booktitle = {In Proc. of Int. Conference: Copper-95}, pages = {}, year = {1995}, abstract = {}, url = {} }