Topic on Talk:Naming things/Flow

camelCase vs snake_case vs hyphens

2
Ottomata (talkcontribs)

Reposted from https://phabricator.wikimedia.org/T281499#7085780


Especially when naming symbols for data (which almost everything is) I'd really like to stress that capital letters in data keys is a really bad idea.

https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Identifier_Naming_Rules

Data moves around. It will be used in different languages with different typing and different naming rules. It will certainly be used in SQL systems, which are for the most part case insensitive. The only common identifier naming rule that will function in all of these systems is snake_case.

Any time data passes through a case insensitive system, it will be normalized, most likely to all lower case.

Fields like isPartOf and mainEntity will become ispartof and mainentity. Longer names that include acronyms get even worse. In camelCase, it isn't clear what the acronym capitalization rules are. E.g. HTTPURLID? HttpUrlId? Whatever the camelCase acronym rule is, the name will be normalized in SQL systems to e.g. httpurlid. Data integration automation code has to reason about which fields are the same. If ingesting data that has capital letters, it is possible that two different fields end up normalized to the same lower cased name. Then we just have to guess about how to ingest data.

Every time someone needs to move camelCased data identifiers in case insensitive systems, they will have to write code that reasons about the case changes. If we avoid upper cased field names in our schemas, we are less likely to encounter bugs and breakages in data pipelines.

Additionally, I've heard that camelCase can be difficult for non native English speakers. incomingHTTPRequestIpAddress (which is normalized to incominghttprequestipaddress) is (subjectively) more difficult to read than incoming_http_request_ip_address.


Is it worth adding this to this Naming_Things page?

Jdforrester (WMF) (talkcontribs)

I feel like this page is more focussed on 'naming things for humans' instead of naming parameters/variables/inputs for technical systems, which more fits under Manual:Coding conventions I suppose?

Reply to "camelCase vs snake_case vs hyphens"