Introduction
In this article, we explore three approaches to fetching data from SQL databases in Go. We’ll start with the conventional method based on JSON marshaling, move on to a more flexible dynamic approach using maps, and finish with an optimized technique that minimizes overhead and maximizes performance. Each method reveals a different balance between simplicity, flexibility, and efficiency — essential trade-offs for any Go developer working with databases.
In my practice, there were very frequent cases where I had to retrieve data from a database with an unknown structure beforehand. This often happens in the e-commerce industry.
We will consider 3 methods of retrieving data, along with their advantages and disadvantages.
You can get all the code of the examples from the repository
https://github.com/oleg578/dbf
Data preparing
For demonstration purposes, we’ll use MariaDB, though the examples apply equally well to MySQL or any other SQL-compatible database. The focus is on the Go code and data-handling patterns, not on the specific database engine.
Code for create MariaDB docker instance and seed test data are in https://github.com/oleg578/dbf/tree/main/db
All examples are tested on Ubuntu 24.04.1 LTS (Noble Numbat) with 11th Gen Intel Core i5–1135G7 and 16GiB RAM.
Standard way
The standard way is trivial — we fetch rows from database into array and then Marshal
it
into JSON
struct
rs, errRs := con.QueryContext(ctx, q, numbRows)
...
for rs.Next() {
var dmy = Dummy{}
if err := rs.Scan(
&dmy.ID,
&dmy.Product,
&dmy.Description,
&dmy.Price,
&dmy.Qty,
&dmy.Date); err != nil {
panic(err)
}
result = append(result, dmy)
}
if err := rs.Err(); err != nil {
panic(err)
}
msg, errRTJ := json.Marshal(result)
if errRTJ != nil {
panic(errRTJ)
}
...
_, errOut := os.Stdout.Write(msg)
...
What about speed?
Test result:
% ./db2json_struct 10000000 1>/dev/null
Elapsed time: 12631 ms, HeapAlloc = 5400.725 MB, Sys = 7099.447 MB
10000000 records read
Let’s just remember this as a starting point.
Using map way
Next we consider fetching unknown list of columns — like “SELECT * FROM …”.
The sequence of actions is simple.
Each record will be represent as map[string]interface{}, then
// create result slice
// numbRows is number of rows in result
outRows := make([]map[string]interface{}, 0, numbRows)
We will not serialize each record to save program execution time, and our actions are
not
complex.
See https://github.com/oleg578/dbf/tree/mapping/example
After fetch rows from database, we will request an slice of columns
columns, err := rs.Columns()
Create slice of values and slice of pointers for data
values := make([]interface{}, len(columns))
valuePointers := make([]interface{}, len(values))
for i := range values {
valuePointers[i] = &values[i]
}
Then for each row we get the map which represent model — https://github.com/oleg578/dbf/blob/mapping/sql2json.go
func Row2Map(columns []string, values []interface{}) (map[string]interface{}, error) {
rowMap := make(map[string]interface{})
if len(columns) != len(values) {
return nil, errors.New("columns and values length not equal")
}
for i, col := range columns {
rowMap[col] = assignCellValue(values[i]) // we will help to typify the value
}
return rowMap, nil
}
func assignCellValue(val interface{}) interface{} {
if b, ok := val.([]byte); ok {
if floatValue, err := strconv.ParseFloat(string(b), 64); err == nil {
return floatValue
}
return string(b)
}
return val
}
Note:
You may want to pay attention to the function assign CellValue — it's purpose is
to
pre-assign a type to column values. Simple trick —
this function tells the JSON encoder which values to accept as
non-numeric.
Benchmark:
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkRow2Map
BenchmarkRow2Map-8 7376358 159.0 ns/op 336 B/op 2 allocs/op
Finally, execution time of our example — https://github.com/oleg578/dbf/blob/mapping/example/main.go
% ./db2json_map 10000000 1>/dev/null
Elapsed time: 12152 ms, HeapAlloc = 8966.899 MB, Sys = 12248.287 MB
10000000 records read
What can we say about these results?
Benefits — We can obtain structures that are not predefined.
Disadvantages — there are several.
But we pay for this with memory consumption.
It’s easy to see that such an implementation consumes 1.5 times more memory.
The second disadvantage which can give us a headache are data types. We must define
what
we will access
as numeric data, and what we will define as string. In current we use AssignValue
auxiliary
function:
func assignCellValue(val interface{}) interface{} {
if b, ok := val.([]byte); ok {
if floatValue, err := strconv.ParseFloat(string(b), 64); err == nil {
return floatValue
}
return string(b)
}
return val
}
If value can be represented as float — then we define it as interface value (json library will define it as numeric or null), else as string.
The third disadvantage is map structure property — we can’t guarantee order of fields in json. But this disadvantage may be unimportant.
We cannot say that the result we got is satisfactory.He devours memory. On small
datasets
this may be
acceptable, but this can negatively affect processing with large amounts of
data.
https://github.com/oleg578/dbf/tree/mapping
What can we improve?
Let’s look at the weak points in our algorithm of actions when we use maps.
This a creating map[string]interface{} for each row — this very expensive operation
in
terms of
resources and processor time;
And another too expensive operation — JSON marshaling the final slice which can be
very
big.
It’s time to think about improvement
Let’s play with data structures
When fetching data from a SQL table, the order of columns is always defined by the database — deterministic and consistent across rows. This means we can skip the overhead of maps and instead rely on two coordinated slices: one holding column names, the other holding their corresponding values. It’s simpler, faster, and perfectly aligned with the way the database delivers data.
The next refinement is to request values as byte slices and retrieve column metadata
via
ColumnTypes.
This approach gives us type information up front while keeping data handling
lightweight
—
an efficient foundation for streaming, type conversion, or dynamic encoding later
on.
columns, err := rs.ColumnTypes()
if err != nil {
log.Fatalf("fault get column types: %v", err)
}
values := make([]sql.RawBytes, len(columns))
valuePointers := make([]interface{}, len(values))
for i := range values {
valuePointers[i] = &values[i]
}
So, we are ready to fetch data in a new way and it’s time to serialize this data.
The JSON library is heavy, but we can make serialization easier.
Columns are simple token-level data usually — we can turn them into JSON strings,
then we
will just
escape special symbols by our escape function -
func escape(in []byte) []byte {
var out bytes.Buffer
for _, b := range in {
switch b {
case '\n', '\r', '\t', '\b', '\f', '\\', '"':
out.WriteByte('\\')
out.WriteByte(b)
case '/':
out.WriteByte('\\')
out.WriteByte(b)
default:
if b < 32 || b == 127 {
out.WriteString(`\u00`)
out.WriteString(strconv.FormatInt(int64(b), 16))
} else {
out.WriteByte(b)
}
}
}
return out.Bytes()
}
Now let’s think about data types. We can determine the type of column (using sql
ColumnType structure
sql.ColumnType)
func isDigit(c *sql.ColumnType) bool {
switch c.DatabaseTypeName() {
case "TINYINT":
return true
case "SMALLINT":
return true
case "MEDIUMINT":
return true
case "BIGINT":
return true
case "INT":
return true
case "INT1":
return true
case "INT2":
return true
case "INT3":
return true
case "INT4":
return true
case "INT8":
return true
case "BOOL":
return true
case "BOOLEAN":
return true
case "DECIMAL":
return true
case "DEC":
return true
case "NUMERIC":
return true
case "FIXED":
return true
case "NUMBER":
return true
case "FLOAT":
return true
case "DOUBLE":
return true
default:
return false
}
}
And finally, let’s apply primitive serialization:
func Row2Json(columns []*sql.ColumnType, values []sql.RawBytes) (string, error) {
if len(values) == 0 {
return "", errors.New("no data in values")
}
if len(columns) != len(values) {
return "", errors.New("columns and values length not equal")
}
var buff strings.Builder
buff.WriteByte('{')
for i, val := range values {
buff.WriteByte('"')
buff.WriteString(columns[i].Name())
buff.WriteByte('"')
buff.WriteByte(':')
if len(val) > 0 {
if !isDigit(columns[i]) {
buff.WriteByte('"')
}
buff.Write(escape(val))
if !isDigit(columns[i]) {
buff.WriteByte('"')
}
} else {
buff.WriteString("null")
}
if i != len(values)-1 {
buff.WriteByte(',')
}
}
buff.WriteByte('}')
return buff.String(), nil
}
Benchmark:
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkRow2Json
BenchmarkRow2Json-8 2881545 385.3 ns/op 440 B/op 9 allocs/op
Use UNIX way to output
We will use UNIX way of output of our program — i.e. we will not create an output
slice —
we will out
data into standard output stream instead — then we can use the output as a standard
pipe
in UNIX:
// create new buffer
writer := bufio.NewWriter(os.Stdout)
defer writer.Flush()
writer.WriteByte('[') //start print array of data
...
msg, errMsg := dbf.Row2Json(columns, values)
...
if _, err := writer.WriteString(msg); err != nil {
log.Fatalf("fault write row: %v", err)
} // write serialized row
...
writer.WriteByte(']') // finish serialized slice
In success execution we will get something like:
% ./db2json_ds 3 2>/dev/null | jq
[
{
"id": 1,
"product": "product_1",
"description": null,
"price": 1.23,
"qty": 10,
"date": "2021-01-01 00:00:00"
},
{
"id": 2,
"product": "product_2",
"description": null,
"price": 2.23,
"qty": 20,
"date": "2021-01-01 00:00:00"
},
{
"id": 3,
"product": "product_3",
"description": null,
"price": 3.23,
"qty": 30,
"date": "2021-01-01 00:00:00"
}
]
It’s a time of the moment of truth — fetch 10 million records:
% ./db2json_ds 10000000 1>/dev/null
Elapsed time: 11894 ms, HeapAlloc = 2.436 MB, Sys = 11.710 MB
10000000 records read
Let’s compare with the starting point:
- Execution time — 11.647 seconds instead 12.631 seconds;
- Memory consumption — 11.710 MB instead 7099.447 MB. So, up to 10 percent faster and 600 times less memory consumption.
Conclusion
Let’s examine the broader scope of the tests.
Test comparison table (the length of result json file is 1.2Gb)
Benchmarks comparison table:
Real test memory consumption comparison table:
The fast method (fetch data using slices) is not the fastest in benchmarks, but it’s fastest and the least gluttonous in real tests.
Quantitatively, the slice-based approach delivers:
- Performance comparison: Full 10M-row export finishes in 11.647 s versus 12.631 s for the internal library method (Standard) (~7.8% faster).
- Speed comparison: Throughput grows from 0.79 million to 0.86 million rows per second (~8.4% gain).
- Memory consumption: Peak heap usage shrinks from 7.1 GB to 11.7 MB (≈99.8% reduction, about 600× smaller).
Analyzing the results, we can make two important conclusions, in my opinion :
- simple solutions always work more effectively;
- real tests are always more important than synthetic ones.
Happy coding 🙂