Archiv für April 2012

Machine learning for identification of cars

This is a handy getting started guide for computer vision using R from e.g. surveillance cameras, as all the with R-bloggers: it contains the needed source code.

Machine learning for identification of cars.

Hinterlasse einen Kommentar

…just answered: Where are the Semantic web incubators? Any thoughts on building an economic ecosystem for Semantic web to keep momentum enough to attract si

My basic message is: as long as your startup wants to use semantic tools/infrastructures (vs. providing tools and infrastructures) you likely not need a specific incubator, as semantic web just influences the tech part of your startup.

Semantic Web: Where are the Semantic web incubators? Any thoughts on building an economic ecosystem for Semantic web to keep momentum enough to attract sizable investment? 1 answer on Quora

Where are the Semantic web incubators? Any thoughts on building an economic ecosystem for Semantic web to keep momentum enough to attract sizable investment?

Hinterlasse einen Kommentar

How to work with Google n-gram data sets in R using MySQL

How to work with Google n-gram data sets in R using MySQL. via R-Bloggers: I like really much about this blog the focus interesting things along with code examples to try it out on your own.

N-Grams datasets can also be created from your own texts using NTLK functions (see http://nltk.googlecode.com/svn/trunk/doc/howto/collocations.html ): in analytical use cases N-grams give you a better basis to have a machine ‘understand’ the meaning of a text (compared to looking at the words individually.

—Update from 2012-04-10:

Stefan Keller ( http://twitter.com/sfkeller  ) hinted me to a blog entry about how to use n-grams in a PostgreSQL based setting to optimize search functionality.

Hinterlasse einen Kommentar

knitR: Report-Generierung mit R

knitR: Report-Generierung mit R

In Projekten zur Datenauswertung ist es häufig relevant, die Ergebnisse zeitnah ansprechend dokumentieren zu können. Ein gute Weg dabei, den Fortgang der Experimente und deren Ergebnisse in einem dynamisch generierten Dokument zu verfolgen: dabei kann knitR helfen.

knitR bügelt einige Schwächen der Sweave-Lösung (http://www.statistik.lmu.de/~leisch/Sweave/ ): insbesondere können gut aussehende Grafiken besser eingebaut werden.  

Weitere Infos zum Tool finden sich unter http://www.inside-r.org/howto/knitr-elegant-flexible-and-fast-dynamic-report-generation-r , eine Beispielausgabe ist unter http://cloud.github.com/downloads/yihui/knitr/Stat615-Report1-Yihui-Xie.pdf zu sehen.

, , ,

1 Kommentar

Will web 3.0 be all about the use of a single taxonomy?

The Wikidata project ( http://meta.wikimedia.org/wiki/W… ) somehow follows the path of the DBPedia project ( dbpedia.org ) in the regard to connect/collect the described facts in Wikipedia, the extent to which it will follow the semantic web & ontology standards is still open. (current status can be seen at http://meta.wikimedia.org/wiki/W… ).

I support Jorn Barger's statement above: you can't build an universal ontology, but I would like to add that – for most of the use cases Ih have seen until now – THE one correct & complete ontology is not needed.
(and yes pictures may help people to identify things)

To my understanding it is much more important to be able to work on your own ontology subset and to link it with somehow more general documented wisdom. (e.g. linking to DBPedia concepts from your own terms using the SKOS vocabulary at http://www.w3.org/TR/skos-primer/  .)

(My experience from this kind of standardization projects is that you might be able to manage the technical side of it, but the organizational and managerial aspect get very complicated once you target a singel taxonomy).

Will web 3.0 be all about the use of a single taxonomy?

Hinterlasse einen Kommentar

Silicon Valley, London, NYC: Startup Genome Data Reveals How The World's Top Tech Hubs Stack Up

rebloggt von TechCrunch:

Klicke, um den Original-Artikel zu besuchen

Last year, we covered an ambitious collaborative R&D project called "Startup Genome," created by three young entrepreneurs, Bjoern Herrmann, Max Marmer, and Ertan Dogrultan. The goal of the ongoing project was (and is) to take a comprehensive, data-driven dive into what makes tech startups successful -- and not so successful.

Out of its research came, among other things, Startup Compass…

Weiterlesen… noch 1.909 Wörter

I like the clear structure along useful criteria and this mirrors (re. placement of Berlin) some of the misconceptions in the #Berlin hype. By the way, I am in Tel Aviv from 20th of April and hope to some some of the reasons for the deep blue colouring on the map.

Hinterlasse einen Kommentar

A response to “The race for speed at the data layer” re. SAP HANA

I just wanted to post some remarks to the very interesting blog from David Smith, as I was able to take a deeper look to the HANA appliance:

  • I agree with the statement that tool providers focus today on ‘high-performance analytics’:But the most important steps in the SAP-/ERP-world is still to be done: too much of analytics domain information is today deeply buried in application code, BI tools from the past were merely seen as pure inspection tools to this information.
  • SAP is about to place more application logic on the database layer, which in perspective enables more of David’s “more than just basic analytics”: the usage of (optimizable) prediction models could be possible then.
  • (I remember especially a very interesting use case for “more than just basic analytics” from an SAP discussion: appliances like HANA with specific application functionality enable a production company/facility to evaluate the ‘best’ scenario of how to fulfill orders in taking into account the bills of material and facts like availability of parts in case of limitations.)

SAP in fact announced formal R integration:

The complete R integration was not present in the previews of HANA I have seen: the key to the success of R in the SAP world is to which level constraints for R are in place: e.g. whether all the nice machine learning/hadoop enablers for R can be used. ( only a small-scale R-language support would not be sufficient for these use cases.

, ,

Hinterlasse einen Kommentar

Business idea: Put touristic activities in personal travel planning

I’ll start with this blog entry a session of business ideas, which come up near me… which I cannot pursue at the moment, but are maybe interesting for others.

Tagline: Offer spare time activities to people planning a trip fitting to their interests and their personal time planning.

Technology: Mashup of APIs used from travel planning tools (like tripit.com or dopplr.com ) and crawled/stored information about events, touristic activites etc.  based on user profiling e.g. from Facebook Likes.

Business models: mainly affiliate model (bringing guests to organizers of events/tour organizers)

Martin Hepp ( @mfhepp ), the author of the GoodRelations vocabulary for eBusiness, just posted a cookbook entry to show how business entities offering travel activities (outdoor, concerts etc.) can publish this information in a machine-readable way.

(I think) for this reason he defined the Ticket Ontology to describe events, activities and their business impact.

But for the time being (as long as not many travel organizer make their activities machine-readable) a crucial technical part is the collection of travel activites and making / keeping connections with these business entities offering activities.

Even this idea can make use of BigData analysis techniques: you can initially optimize and later predict, which kind of activity is attractive to which group of users. (a use case of customer segmentation).

 

, , ,

Hinterlasse einen Kommentar

Follow

Bekomme jeden neuen Artikel in deinen Posteingang.

Schließe dich 732 Followern an