Tag Archive: deep learning

I am so proud that our financial news summarization model is within 4% of all the 8026 models (in terms of average monthly downloads) in Hugging Face! I would like to thank my colleagues Grigorios TsoumakasTatiana Passali and Alex Gidiotis!

By passing millions of ImageNet images through InceptionV1 (state-of-the-art deep convolutional neural network) we can extract the image patches that make specific neurons from various convolutional layers to activate mostly.

By projecting the image patches to 2D using UMAP we can see what the neural network “sees” at the various layers.

This is a great way for explaining how a computer vision model makes its classification decision.

However, the following part of the article was the reason for my post:

“….There is another phenomenon worth noting: not only are concepts being refined as you move from layer to layer, but new concepts seem to be appearing out of combinations of old ones….”

This is how a world of complexity works.

We know that deep neural networks perform hierarchical feature learning and combine simpler features to learn more complex ones. This is one of the reasons why we use deep learning for audio, visual and textual data.

Deep learning can decompose the complexity of data!

Have you ever asked why we randomly initialize the weights of a neural network?

After you read this post you will know why!

When the weights of the neurons in a neural network’s layer are initialized to the same value then all neurons of the layer produce the same output in the forward propagation.

Furthermore, when doing backpropagation the gradients of the loss w.r.t to the weights of the layer are also the same values. So, when training happens with gradient descent the weights change in the same way.

Lastly, when gradient descent converges, the weight matrix of the layer contains the same values for all neurons and thus the neurons have learned the same thing.

To break this symmetry and allow the neurons of the layer (and in general in all layers) to learn new and different features we randomly initialize the weights!

Despite the promise of Feature Learning in Deep Learning -where dense, low-dimensional and compressed representations can be learned automatically from high-dimensional raw data- usually Feature Engineering is the most important factor for the success of an ML project.

Among ML practitioners the best learning algorithms and models are well-known and most effort is done to transform the data in order to express as much as possible the useful parts that model best the underlying problem.

In other words, the success of an ML project depends mostly on the data representation and not model selection / tuning. When the features are not garbage usually even the simplest algorithms with default hyperparameter values can give good results.

Conditional Language Models are not used only in Text Summarization and Machine Translation. They can be used also for Image Captioning!

Here is a great example from Machine Learning Mastery of how we can connect the Feature Extraction component of a SOTA Computer Vision model (e.g., VGG, ResNet, Inception, Xception, etc) with the input of a Language Model in order to generate the caption of an image.

The whole deep learning architecture can be trained end-to-end. It is a simple encoder-decoder architecture but it can be extended and improved using an attention interface between encoder and decoder, or even using Transformer layers!

Adding attention not only enables the model to attend differently various parts of the input image but also explain its decisions. For each generated word in output caption we can visualize the attended visual part of input image.

Natural languages (speech and text) are the way we communicate as species. They help us to express whatever is inside to the outer world.

Natural languages are not designed. They emerge. Thus, they are messy and semi-structured. If they were designed, NLP would be already solved, using context-free grammars and finite automata by linguists 50 years ago.

Today, we are trying to artificially “learn” language from text using state-of-the-art Deep Neural Language Models that behave probabilistically, predicting the next token in a sequence.

Moreover, natural languages are not static. They evolve and change. Different words can be used in different times with different meaning. It is a moving target.

Plato, the Greek philosopher was negative with “languages” -despite the fact the he has written so much- because a language cannot express the fullness of a human mind, of a person. Socrates and many philosophers from the Peripatetic school never wrote texts. The only way they were communicate was by real human communication (body language, eyes, speech, touch). Only with this way, a human mind and heart can evolve and create new worlds.

However, we are living in a century where everything is either digitalised or written and human communication goes to minimum.


Some personal notes to all AI practitioners!

In Linear Regression when using the loss function MSE it is always a bowl-shaped convex function and gradient descent can always find the global minima.

In Logistic Regression if we use the MSE then it will not be a convex function because the hypothesis function is non-linear (it uses a sigmoidal activation). Thus, it will be harder for gradient descent to find the global minima. However, if we use the cross-entropy loss it will be convex and gradient descent can easily converge to global minima!

Support Vector Machines have also convex loss function.

We should always use a convex loss function so that gradient descent can converge to the global minima (local optima free).

Neural Networks are very complex non-linear mathematical functions and the loss function most often is non-convex, thus it is usual to stuck in a local minima. However, most optimization problems in Neural Networks are due to long plateau and saddle points rather than local minima. For such problems advanced gradient descent optimization variants were invented (eg: Momentum, Adam, RMSprop).

Happy optimizations!


E. Chatzikyriakidis, C. Papaioannidis and I. Pitas, “Adversarial Face De-Identification,” 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 684-688


Anastasios Tefas



Presentation topic: “Content-based Image Retrieval”

Presenter: Efstathios Chatzikyriakidis

PDF presentation: https://bit.ly/39FVpNg

Source code: https://bit.ly/38Q5XKv

Presentation topic: “Adversarial Face De-identification”

Presenter: Efstathios Chatzikyriakidis

PDF presentation: https://bit.ly/3ik1nHC

Experiments (exported files): https://bit.ly/3qrzdx7