NLP
NLP
Few days ago, I was taking a look about new technologies around the neighbourhood. I’ve worked for about 6 years in the technological industry, most of the times working with x or y technology or simply solving any problem related to a certain specific context.
Due to this pandemic situation and its impact over the entire world, the time off became mandatory; and here we are let’s invest some time into digging about something new.
How this started?
Well first of all, have you ever tried to discover or get in touch about some analysis of text in this new era of tweets and sms? If you’re as curious as I’m about this, welcome this is gonna be your place.
What’ll be using?
We’ll be using some python library called TextBlob, this one seems to be the most common and useful library for the beginners in this case. One thing that you need to ask yourself will be: What do I want to solve with this? In my case I was wondering about sentiments analysis with this library in particular.
What else will be need?
Just some curiosity will be needed, I’ll be taking as a base this material. Nothing here will be really complex, I just wanted to keep it fun in certain way for myself because I really love the language and the written communication above all.
Let’s play with:
First of all we, will create our requirements.txt in our case we’ll be using pip as dependency manager and we can surely collect all the libraries needed. Something like this one will be the result:
The previous stated file will be living in our root directory, in the same place we’ll be adding another two structures need in order to proceed with our mission:
main.py : Here will reside the source code, and this is gonna be a new file.
data : Here will reside the dataset, that we’ll be using in order to support our playground over here, this is gonna be a directory called data.
At the end you’ll end up with something like this:
You’re ready to go just install the libraries in your local end and make sure you have python environment installed in your local devbox.
Let’s filling the missing gaps:
Install the dependencies, if you’re having just like me pip installed on your devbox you’ll need just to type something like this:
pip install -r requirements.txt
Let fill then the source code, we wont be digging into something very complex because at know we’re more interested on getting this working ASAP to start analysing some more complex sentences and datasets:
As a result you’ll see something like this
What the new missing gaps are?
- We started designing a piece of software that will embrace an “input text”, and will perform an analysis against it.
- As a result we’ll be having some output like the next one:
polarity: Will be an index ranging from -1 to 1, whereas the -1 will be the most negative one and in the opposite side the 1 will be the most positive one.
subjectivity: Will be an index ranging from 0 to 1, whereas 0.0 is very objective and 1.0 is very subjective. - What can be inferred about the polarity and subjectivity about a phrase? Actually at this moment I don’t know really whats the real meaning about this, but in further parts of this serie we’ll be covering some aspects. By now and in a social approach i’m guessing something like this:
Let say that we are having some kind of Cartesian plane with the 4 quadrants.
Will be valid to establish something very basic like the next one?
What’ll be in the next steps?
Will be fun and will be meaningful to find some new insights about how the people get connected to each others in an environment like twitter with 280 max chars.Will be possible to discover an account or a tweet under haters attack?
By now sincerely, I don’t know but let’s prepare something more meaningful into the next article. I’ll be writing a series of articles dedicated to this topic in specific in the prior days. The next one will include a Twitter dataset to perform a broader analysis may be.
Still needing more info? I’m sure you do, take a look into it and obviously feel free to open up the conversation.
EDIT:
I’m including an excellent reference guide to know more about this topic:
https://neptune.ai/blog/sentiment-analysis-python-textblob-vs-vader-vs-flair
If you’re looking for some other alternatives, I will encourage you to take a look on the blog neptune.ai: they’re covering everything to know a little bit more on this.
https://www.reddit.com/r/LanguageTechnology/comments/a4nfia/as_a_self_learner_and_beginner_how_to_actually/
https://www.nltk.org/
http://www.liwc.net/liwcespanol/