Automatic Tagging

The article Automatic Detection of Tags for Political Blogs described a system created to automatically generate tags for political blogs.  Tags are generally used to quickly find blog posts related to specified keywords, and presently are user identified.  This can create problems, as it requires the user to identify all individual keywords within a post, assign them the same tag as previous times that keyword was tagged, and take the time to apply the tags.  It is both a flawed system and sometimes not even used when bloggers are hurried or lazy.

In retrospect, I believe this was done on political blogs because it would be easiest to apply here.  Political blogs are very clear on their keywords, and generally focus on the same keywords, much of which is either constant issues or ones related to current events.  As such, their keywords are predictable and easy to identify.  An automatic tagging system in this case simply needs to look for names that are politically important and other keywords that are currently politically relevant.

As one who’s written in numerous blogs with numerous topics, I feel an automatic tagging system would be incredibly convenient.  Of course, with many (if not most) blogs, keyword identification may not be nearly as feasible as with political blogs.  As such, an automatic tagging system might not be ideal for most blogs.  On the other hand, if linguistic algorithms can be designed to identify the keywords, normalize the words, and apply tags with a high accuracy, such a system could potentially be incredibly useful.  As the system would not be perfect, users should be able to edit, add, and remove tags, and, if so chosen, automatic tagging should be able to be toggled off.  However, I know that I, among others, would much prefer automatic tagging.  As noted in this blog, I often don’t even bother with tags.  I often feel it’s an unnecessary waste of time, with the exception of widely read blogs on specific topics.  I will admit, though, that tagging enhances any blog, even personal ones, and automatic tagging would not be an unnecessary feature.

The fact that automatic detection of tags was designed for political blogs with high accuracy makes me feel that a full automatic tagging system might be possible to design and implement.  It is simply a matter of someone investing the time and energy to create such a system.


Leave a comment

Filed under Uncategorized

Dialogue in Videogames – Developer’s End

Now, I know I just wrote a paper that critical looks at dialogue in video games.  I looked specifically at player dialogue in online games.  And yet that is not the only kind of dialogue that exists in video games, and a couple of the other projects that focused on language in video games got me thinking on other areas of dialogue in video games.  Specifically, I started thinking of the developer’s dialogue.

Perhaps I should be more accurate in what I mean by the developer’s dialogue.  I mean dialogue of non-playable characters (NPCs), and even the scripted dialogue of playable characters (PCs).  After all, I was originally a game design major with a minor in creative writing, and even now my current major is a blend of game design and creative writing.  I had a large interest in writing in video games, specifically script writing (or dialogue).

Dialogue is quite important in most genres of video games.  Some, like first person shooters, use dialogue to communicate missions, while others, like role playing games, are incredibly reliant on dialogue for the sake of storytelling.  In the case of massively multiplayer online games (MMOs), PC dialogue is usually absent and NPC dialogue is rarely as important as player dialogue.  Game play is not as story driven in these cases, as rewards and community seems to overshadow the story.  I know this from experience playing MMOs; players usually complete tasks either to reap rewards or to help others reap rewards.

Offline games tend to rely on dialogue to tell story, though there are exceptions.  Braid, an award winning XBox Live Arcade game, is a beautiful example of this. Much of the story is read in books placed at the start of each world.  However, near in the end of the game, the hero helps the princess (his lover) escape from a villain and reunites with her.  Well, it seems that way, except it then reveals that the event was in reverse, and playing it back shows the princess was trying to escape from the “hero”, who was smothering her, and the “villain” was helping her get away from him. This entire scene is conveyed with no dialogue, showing that dialogue is only one way for video games to tell stories.

Does this mean dialogue is in danger of being obsolete in single player video games?  Could different story-telling techniques ever become the norm for video games?  I highly doubt that.  Though online games make dialogue between players more and more important, most single player games seem to rely heavily on dialogue.

One of the most popular video games of the past year, Portal 2, relies heavily on dialogue, despite the fact that the protagonist is mute.   The villain, an insane AI by the name of GLaDOS, constantly degrades the protagonist, and her condescending insults have become so popular that there are even GPSes to replicate her voice and mannerisms.  A supporting character, an unintelligent robot named Wheatley, both guides and amuses players throughout the adventure.  Even a minor character, a robot obsessed with outer space (aptly named the Space Personality Core), has become a well-known internet meme.

I could, of course, give countless examples of dialogue usage in video games.  It’s a full career in itself.  Many games have multiple writers, as larger games may have tens of thousands of lines of dialogue, if not more.  Branching stories have added layers of complexity much deeper than Choose Your Own Adventure books ever could reach.  It’s definitely interesting to see where dialogue may go in future video games.  Perhaps it will be possible to have actual conversations with NPCs, eliminating the need for the PC to talk at all.  Either way, I’m sure there is a lot more to research on this particular subject.

Leave a comment

Filed under Uncategorized

Hashtag Popularity

Now I personally am not a Twitter user.  I do have a Twitter account, but I only used it for a short while before growing bored with it.  As such, I am no way well-informed on the subject of hashtags.  I do still know how they are used, though.


The study on how hashtags propagate is definitely interesting.  However, I was left with more questions than answers.  I felt the study was very limited.  From what I gathered, they only picked three hashtags to evaluate, all of which were popular hashtags.  These hashtags followed the idea that the rich get richer.  Yet, wouldn’t the fact that the hashtags were already growing in popularity affect the results?  Even if these hashtags were randomly picked without prior knowledge of the popularity, three hashtags hardly represents the behavior of all hashtags.  Is it possible hashtags could fluctuate more?  Could there be ones that reach a certain popularity and then stay constant or even decrease?  Could hashtags remain unpopular for awhile and suddenly gain some popularity?  Could other hashtags repeatedly increase and decrease in popularity over a course of time?

I could go on and on with questions.  I felt the research was very limited and more research could be done.  It did peak my interest, and the conclusion of the studies were logically sound in addition to being backed up with the research.  I just think there could be more research done with a broader range of hashtags.

At the same time, how relevant is this research?  Of course there are ways the results could be utilized, such as spreading information quickly in emergencies, but is the technology constant enough to be worth studying?  Will the results of these studies be obsolete in a decade?  Perhaps that can’t be answered, and the results might prove useful for future technologies.  Personally, sheer curiosity would be enough for me to carry out research in this direction.  The thing is would it be an efficient use of time and money?

Leave a comment

Filed under Uncategorized

Internet Multimedia as Linguistics

It’s quite interesting how linguistics on the internet is becoming more and more dependent on other media.  Previously, written language was fully based on text, with the exception of cases where pictures are relevant (picture books, signs, etc). However, the internet has created a sort of network between all forms of media.

Of course, there are plenty of websites that are purely text.  Even websites that are text and images may use alt text, allowing the images to be replaced by text (primarily for use when the images cannot load and in cases of accessibility for the blind).  Yet, it is becoming more and more rare to find websites that only rely on text, or even text and images.

A perfect example of this level of multimedia is shown on Facebook.  Facebook users often rely on the ability to use many forms of media, many times seamlessly. Pictures are uploaded, sometimes moments after being taken, and friends can be tagged in them, whether to link them to images of themselves or simply to grab their attention. In the same way, videos can be uploaded, and, though it has yet to support uploading audio, many users post Youtube videos to share music.  Apps allow people to not only participate in interactive media, but share them as well, potentially adding infinite other ways for media to be integrated into communication.

And then there’s the hyperlink.  This little key component of the world wide web provides a linguistic pathway heretofore unparalleled, and the application of this pathway is endless.  This is the primary thread connecting the multimedia together, with text (and sometimes images) usually being the backbone of it all.  As an example, two friends on Facebook might have a conversation.  One might find an image to work better than text as a response, linking that.  Further into the conversation, the other friend might tag one of their friends to pull them into the conversation instead of communicating directly.  That friend might know of a related web page and choose to add another hyperlink.  In this way, potentially endless links are formed, most of which intended as communication.  One can argue that, though it isn’t spoken or written language, these media are all being used as language.  I believe that, arguably, the internet has become  one of the biggest changes to linguistics since the writing system was created.  It is giving the average person an incredibly number of tools to creatively and functionally affect the very way they converse.

The number of media available to the internet user is still increasing.  Developments are being made in the areas of smell, taste, and touch.  We could grow to the point where we could download scents to fill our homes and share new recipes first by taste.  Hologram technology has even reached breakthroughs such as touchable holograms (, meaning someday we might be able to download virtual items we could actually feel.  These technologies are advances at an incredible pace.  How might they affect linguistics?

Leave a comment

Filed under Uncategorized

Twitter Propagating News

After reading about how how Twitter was used to spread information in emergency situations, I started thinking about how applicable this could be.  It was already proven there is a difference in the speed a tweet is retweeted based on whether the tweet is factual.  Simply integrating some sort of algorithm to allow users to see how accurate the information is could help the propagation.

I actually believe this is the future of emergency systems.  Twitter is perfect for getting across important information very quickly.  In situations of emergencies, the truth algorithm could kick in.  No longer would the public have to rely on news networks to gather and post information on the television, radio, and internet.  The public themselves could contribute any useful information and control the spread of factual information.

Granted, a lot of this was covered in the article we read for class, but I believe this is a tool that should be applied by Twitter.  Perhaps more studies should at least be carried out to find out if this is truly an accurate method.  Either way, I think it would be foolish to let this research go to waste.

Leave a comment

Filed under Uncategorized

Forensic Internet Linguistics

Evaluating internet conversations to identify potential criminals is a task I wish could be invested in more.   The internet makes criminal behavior such as stalking and acts of violence much easier while leaving the criminals hard to track down.  We constantly hear stories of pedophiles using chat rooms and social networking websites to connect with innocent teenagers and abuse them.  It is a real threat and one that could potentially be prevented.

David Crystal used some realistic methods of identifying predators.  Unfortunately, his corpus was relatively limited, as it is hard to obtain data from these cases.  I think it would not be an infringement of rights for this information to be shared to prevent these crimes.  It’s not for publication or even for sheer curiosity.  It’s for safety.

I also do not think it would infringe on rights to survey children’s internet conversations.  Minors already do not have all the rights of an adult, and this is for protection.  Why shouldn’t their online activity be moderated, at least by computer programs looking for the suggestive words such as the ones that Crystal used in his study?  It’s not to limit privacy.  Molestation and rape are real dangers.  I know individuals who have gone through these awful experiences, and it can scar people for life.

It is interesting how language usage can reveal intentions.  Isolated words aren’t necessarily bad, but how often they’re used can be enough to signal crimes before they occur.  Unfortunately, I have my doubts such a system could be perfect.  In rare cases, there may be a high level of suggestive words without any malicious intentions.  I would hope this wouldn’t lead to anyone being falsely criminalized.  I do not know why there would be a high level of suggestive words in such a case, but it’s a factor to consider.  Also, criminals aren’t necessarily unintelligent.  Ones that get away with their crimes may be skilled at being discrete, and so I am certain there would be predators who would find ways around the system.

Despite risks, I think that this research should be taken further.  It’s a huge advancement in the safety of children, which is something I do not believe is worth ignoring.  I don’t know how to easily obtain information and permission to carry this out, but I think it is something that truly needs to be done.  The internet has brought about a new medium for criminals, and it is important that, as we do with all other mediums, there be laws for the safety of individuals, especially children.

Leave a comment

Filed under Uncategorized

Online Ad Efficiency

Reading about how online ads work and the mistakes they can make really made me reflect on situations I’ve seen it fail.  Though I’ve seen countless examples of this, a lot of the ones I remember clearly are from Facebook.  Many times I not only feel the ads I see on Facebook are irrelevant, but I also have trouble finding out where they come from… but perhaps further examination can clear up where they went wrong.


It’s nothing new for me to see ads for dating websites when on Facebook.  The best I can gather is that, because my profile says I’m single, Facebook assumes I’m looking for a girlfriend.  The ads are at least correct is assuming I’m a straight male, though that much can easily be gathered from my profile.  However, for about a year I constantly would see ads for Muslim dating sites.  I am in no way Islamic.  My profile states I’m a Christian, which explains why I sometimes see ads for Christian dating sites, but contradicts the ads for Muslim dating sites.  Sometimes ads appear because multiple friends have clicked like for them, but none of them have clicked like for any of the Muslim dating sites.  In fact, only one of my friends is Islamic, and I hardly talk to her (in addition to the fact that she lives in California, quite far from Rochester, NY).  I haven’t posted anything written in Arabic, nor do I appear to even look Islamic (or even Middle Eastern for that matter).  I simply cannot understand where the assumption was derived from that I would be interested in Muslim dating sites.  Quite clearly this was a failure on the advertising end, and would easily have been rectified; an ad like this should appear only on profiles stating the individual is a single Muslim.

The worst part is that I clicked the “x” to get rid of the ads, stating the reason being they were against my views.  It would be great if that got rid of the ads permanently, but the next time I logged onto Facebook, the ads were back.  I tried multiple times to get rid of them, so clearly the system did not remember my preferences, which might not be a linguistic problem but is a problem nonetheless.


Another good example is  An ad for this website appeared repeatedly on my profile for quite awhile. I am not a Mormon.  Once again, I am listed as a Christian.  Some consider the Mormon religion a branch of Christianity.  Ignoring my views on whether it is, and assuming the ad assumes it is, that does not mean all Christians are Mormons.  In other words, I gave the website no reason to believe I am a Mormon, nor that I am interested in becoming a Mormon.  I will admit I do have one friend that is a Mormon, and it is possible I was speaking to her often at the time this ad would appear, so perhaps that was triggering the ad.  However, it should not, as I had never expressed interest in becoming a Mormon, to her or anyone else.  Once again, this ad was falsely aimed at me.  Perhaps in this case, I had at some point used the word “Mormon” in talking to her, meaning it is very possible that it identified that word and assumed I had an interest in the religion.  I feel this ad would be better targeted towards individuals who state, in their profile, that they are Mormons.

Once again, the “x” refused to get rid of the ad, leading me to believe that the “x” does not actually do anything, or that whatever it does is fairly inconsequential.  I do believe that the “x” could have a linguistic function, by identifying the key words of the ad, determining which part or parts of it were insignificant to me, and used that for future ad placement.  As far as I can identify, it just removes the ad for the rest of the online session.


Another example: ads for research needing gay men with HIV.  I only fill one part of that requirement, being that I am a man.  My profile clearly states I am interested in women, not men, meaning right off the bat it failed to identify very easy to obtain data, wasting the advertising on someone who is admittedly straight.  The HIV part may not have actually been used for placement, as many with HIV don’t publicly state it.  Ideally, the ad could be shown for men stating they are gay and have HIV, and thrown out there sometimes for gay men who don’t state they have HIV, but there is no reason I should have gotten it, unless it was assuming I was a closet gay male with HIV.  I gave it no reason to assume that, though, so I feel it was another wasted ad.


One of the most recent misplaced ads I’ve gotten on Facebook has been for getting help with my diabetes.  FYI, I do not have diabetes.  Once again, I hadn’t ever said anything about having diabetes.  Granted, this isn’t something that would be noted in one’s profile, but I’ve never posted anything on Facebook that should lead the system to believe I do have diabetes.  Another ad that seems to have failed.


I’ve seen Facebook ads fail more times than succeed.  I do not know how they decide their placement, but I do know not everyone receives the same ads.  Sometimes it guesses right.  Sometimes it notes things in my profile that I like, claiming another product is similar and that I should try it.  The products have ended up not being very similar, but the ads were placed correctly.  However, there are still glaring problems in the ad placement.  I think most of these could be fixed by having them look through key parts of the profile, rather than everything posted.  Others didn’t seem to look through anything, as for as I could tell, else they would not appear, as I indicated information that clearly contradicted that ads’ requirements.  For such a widely used website, Facebook’s ads are very inefficient.  There is a lot of marketing opportunity to tap into if someone would simply fix the system.

Leave a comment

Filed under Uncategorized

Week 3 – Does Language Imply Conscious Thought?

Before I begin working on my paper, I would like to discuss my initial feelings on language implying consciousness. This will be brief compared to my paper, nor will it be as heavy in content or arguments compared to the paper. This also isn’t going to directly parallel my paper.


What am I meaning right now when I say conscious thought? The short answer is free will. Humans can choose our actions freely. A machine’s actions are predetermined based on how it is programmed. Even in a neural network with the ability to learn and evolve its thought processes, somewhere within it’s silicon “brain” it has a path that leads to its decision. It doesn’t have the ability to choose for itself.

Now let’s assume we have an AI system “smart” enough to create it’s own sentences that are both grammatically correct and hold understandable semantic meaning to humans or even other AI systems. Does this prove the machine has intelligence? Simply the correct usage of language isn’t enough proof. Even the most primitive systems can be programmed to look for keywords, match it with “knowledge” in its memory, and stick the knowledge into a grammatically ordered sentence. If language usage can prove the existence of consciousness, it would depend on the content of the language outputted. That is what I would like to examine in my paper. I want to analyze what semantics could hold pure mechanical meaning to the system and what semantics would not be able to be explained away as a technological charade.

Leave a comment

Filed under Uncategorized

Neural Nets

(I apologize for posting this later than I intended, I’ve been inconvenienced by computer problems.)

(Oh, and forgive me if I type too casually, by the way; from what I understand, this is a free write, but I hope techniques such as ellipses and sentence fragments are excusable, I feel much more free when I can write how I’d say it.)


Neural networks… how could one not find them fascinating? I suppose they could go over some people’s head. Heck, I don’t even understand the technical intricacies of the systems themselves. Unfortunately, I am in no way a computer engineer, being only literate in software programming, and not even an expert at that. Still, think of the applications. Getting a neural network created that at least fairly accurately represents the human brain, even just the language center, could hold tremendous value in understanding the language acquisition process. Can we yet say with certainty what universal grammar truly entails? Couldn’t this knowledge not only be of scientific interest, but human interest? A full understanding of language development by observing a neural network can allow linguists to learn exactly how to best aid a child in first language acquisition, or even an adult in second or third language acquisition. Experiments could be carried out on these networks that would be impossible, as they would be unethical to carry out on a child.

Simply to be a part in creating a neural network that correctly mirrors the human mind would be incredible. Perhaps studying how language is used and acquired interests me more than the average person, however this could be the missing piece of creating “true” AI. Granted, the book Galatea 2.2 implies this to be true, but it is, in a way, science fiction. We might not be anywhere near having this “true AI”, and I can’t say I understand the complexities of the brain enough to say what it would take to create real intelligence. Simply to have a computer communicate with a person on the same level as people communicate among themselves would be no small feat. The implications of such a breakthrough are probably impossible to really grasp. Would it be a godsend or a mistake? The simple curiosity of science might be enough to, in time, take the chance, even if it is a huge mistake. And I won’t lie, my curiosity is quite high here. I’m not very well researched in neural networks, though I definitely would like to learn more about them. It’s amazing how linguistics can be so pervasive, from the distant past to the far future.

Leave a comment

Filed under Uncategorized

Week 1: Ambiguity in Cyber-Language

I often communicate through the internet or texting, though not nearly at a level I used to. Unfortunately, these mediums cause a level of ambiguity with the lack of body language, facial language, and tone. Though this was confirmed in the second chapter of David Crystal’s Internet Linguistics, I already had realized this and have had to find ways to cope with the problem. The most obvious technique is emoticons, which I am quite liberal with. I probably use more emoticons than actual facial expressions, as they have to also make up for the loss of body language and tone. There are the obvious emoticons, such as a smile 🙂 and a frown 😦 . If I’m really sad, the frown is replaced with a crying face 😥 , or if I’m annoyed, I might use DX. If I’m really happy, a big smile is used 😀 . If I want to show that I’m joking, I use a silly face like :-p , or if I’m having fun, I might use XD .  If I’m shocked I’ll used O.O or if I’m raising an eyebrow I’ll use o.O . A lot of joy is ^_^ and if I’m flattered ^^; . If I want to show sympathy or sadness with less effect, I use this face :-\ , which I can interpret but find it hard to describe. Some of the faces become even harder to describe, or even just to say what emotion they represent, because I use them unconsciously, examples of such being >.< , x.x , and ><; . Those  are but a handful, which some getting as complex as (//.^) .

Clearly I have a variety of of emoticons to choose from, but the big problem is that not everyone  interprets them the same way, nor do they have a similar set of emoticons. There are other ways go signify emotions, such as “lol” or “haha” not necessarily to indicate laughing, simply to diffuse a situation and keep it lighthearted, almost like a nervous laugh, even when the person is not laughing. I also might write “imho”, or “in my humble opinion”, indicating that I do not mean to start a fight. Of course, in the end, there are quite a few factors that interpretation depends on, and so misunderstandings can always become a problem. Perhaps in the future, a more universalized mode of communicating will exist in the cyberspace to help avoid these communication problems. Whether it will be technology or simply human adaptation to the medium remains unknown. I hope some solution does come, though, because these problems are more than just inconvient, they can completely affect human relationships.

Leave a comment

Filed under Uncategorized